Introduction to 
Mathematical Statistics 



A WILEY PUBLICATION IN MATHEMATICAL 
STATISTICS 



Introduction to 
Mathematical Statistics 


PAUL G. HOEL 

Professor of Mathematics 
University of California 
Los Angeles 


THIRD edition; 


John Wiley & Sons, Inc. 

New York • Londoii • Sydney 



Copyright, 1947, 1954 

BY 

Paul G. Hoel 


Copyright © 1962 

BY 

John Wiley & Sons, Inc. 


All Rights Reserved 

This book or any part thereof must not 
be reproduced in any form without the 
written permission of the publisher. 

THIRD edition 

SIXTH PRINTING, AUGUST, 1966 


Library of Congress Catalog Card Number: 62-18992 


printed in THE UNITED STATES OF AMERICA 



Preface 


This edition is a rather modest modification of the second editipn. The 
principal changes have been in organization and coverage of material. 
The organizational changes have resulted in an expansion of the chapter 
on probability, an earlier introduction of expected value techniques, and a 
smoother development of the theory. The coverage of topics lias been 
modified somewhat to permit the inclusion of a few of the newer aijid more 
useful statistical techniques. In particular, a chapter on the elementary 
parts of general decision methods has been added. 

For the benefit of those who are unacquainted with the earlier ' editions 
of this book, I should explain what my objectives were in the writing of it. 
I hoped to produce a text that would give students with only an elementary 
calculus background an introduction to the mathematical theory of 
statistics and at the same time provide them some experience with applica- 
tions. Students in the physical sciences or engineering usually acquire the 
necessary background by the time they are juniors; however, students of 
the social and life sciences often do not do so until they are farther along 
in their studies. 

The number of topics covered is large for a book of this size. I have 
purposely tried to give elementary treatments of a number of statistical 
techniques so that the student will get a taste of the wide range of such 
methods. I feel that it is more important at this level to give a surVey-type 
course than it is to concentrate on a few topics at greater depth. 

Although I believe a first course in statistical rtiethods should strive for 
breadth rather than depth, I am also of the opinion that students vvill not 
really understand or appreciate the methods unless they apply them 
immediately to concrete problems. I have therefore attempted to illustrate 
and apply the theory as soon as it has been presented, and I have ihcluded 
a large number of exercises of varying degrees of difficulty. Many of the 



VI 


JPRhhACE 


exercises are direct applications of formulas to empirical data, others are 
theoretical problems that can be solved by the methods that have been 
presented, and a few are of the type that require considerable ingenuity. 
The instructor is expected to select those exercises that will suit his objec- 
tives in the course. Answers to the even-numbered exercises may be 
obtained in pamphlet form from the publisher. 

From time to time 1 have received letters from some of the readers of the 
earlier editions of this book with suggestions for its improvement. I have 
always appreciated such letters even though I may not have included the 
suggestions in a revision. I wish to take this opportunity to thank all 
who have used my book in the past and particularly those who were kind 
enough to write me concerning it. I am especially grateful to my colleague 
Thomas Ferguson for his numerous helpful suggestions, many of which 
have been incorporated in the present revision. 

Paul G. Hoel 

Los Angeles, California 
September 1962 



Contents 


■i! 


CHAPTER PAGE 

1 INTRODUCTION . ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 

References . . . . . . . ... 3 

2 PROBABILITY . . . . . . . . . -i . 4 

2.1 Introduction . . . . . . . . i; . 4 

v 2.2 Sample Space . . . . . . . . . 4 

2.3 Sample Space Probabilities - . 5 

2.4 Events .‘ 1.6 

2.5 Addition Theorem ; . 8 

2.6 Multiplication Theorem ; . 10 

2.7 Bayes' Formula . . . . . . . , . . 15 

2.8 Combinatorial Formulas . . . . . . . . 17 

2.9 Random Variables ....... - . 22 

2.10 Frequency Functions .22 

2.11 Joint Frequency Functions . . . . . . . 24 

2.12 Marginal and Conditional Distributions . . . ; . 27 

2.13 Continuous Frequency Functions .32 

2.14 Joint Continuous Frequency Functions . . . . . 38 

References . . . . . . . . . ; . 40 

Exercises . . . . . . . . . . 40 

3 NATURE OF STATISTICAL METHODS : . . ^ V . . ' . 45 

3.1 Mathematical Models . . . . . . . - . 45 

vii 



Vlll CONTENTS 

3.2 Testing Hypotheses ........ 46 

3.3 Estimation ......... 56 

References .......... 61 

Exercises .......... 61 

4 EMPIRICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE ... 64 

4.1 Introduction .64 

4.2 Classification of Data . . . . . . .65 

4i3 Graphical Representation of Empirical Distributions . . 67 

4.4 Arithmetical Representation of Empirical Distributions . . 69 

References .......... 79 

Exercises .......... 79 

5 THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE . . 82 

5.1 Introduction .82 

5.2 Discrete Variables . . . . . . . .82 

5.3 Continuous Variables ....... 95 

5.4 Other Distributions 116 

References . . . 123 

Exercises . . . . . . . . . . 124 

6 ELEMENTARY SAMPLING THEORY FOR ONE VARIABLE . . . .131 

6.1 Random Sampling 131 

6.2 Moments of Multivariate Distributions . . . . ,133 

6.3 Properties of E 135 

6.4 Sum of Independent Variables 138 

6.5 Distribution of x from a Normal Distribution . . .138 

6.6 Distribution of x from Non-normal Distributions . . .143 

6.7 Distribution of the Dilference of Two Means . . .146 

6.8 Distribution of the Difference of Two Proportions . . 148 

6.9 Chi-square Distribution . . . . . . ,151 

References . . . . . . , . . .155 

Exercises . . . . . . . . . .155 

7 CORRELATION AND REGRESSION 160 

7.1 Linear Correlation . . 160 

7.2 Linear Regression 168 

7.3 Multiple Linear Regression 172 



CONTENTS 


IX 


7.4 Curvilinear Regression \ * 175 

7.5 Linear Discriminant Furictipns . . . . • ; • 179 

References ..... . . . . :: • 184 

Exercises . . . . . . . • • ;i • 185 


THEORETICAL FREQUENCY DISTRIBUTIONS FOR CQRRILATION 
REGRESSION . . . , . v • • ^ ? 


8.1 Continuous Distributions of Jwp y^riaLbles 

8.2 Normal Distribution pf Two yariables 

8.3 Normal Cprrelatipn , 

8.4 Normal Regression . 

References . . . . . 

Exercises ...... 


1^7 

203 

205 

208 

209 


9 GENERAL PRINCIPLES FOR TESTING IT YI»QTIIESES^^A^^ ESTIMATIQ;^ , 21? 


9.1 Testing Hypotheses 

9.2 Estimation 
References . 
Exercises . 


212 

228 

240 

24Q 


10 TESTING GOODNESS OF FIT , . . v • • r i* • 744 

10.1 The Test if . 744 

10.2 Limitations of the Test . . . . . . • 747 

10.3 Applications • 248 

10.4 Generality of the Test . . . . . v • 249 

10.5 Frequency Curve Fitting • 250 

10.6 Contingency Tables . . . . . . . * 752 

10.7 Indices of Dispersion . , . . . . • • 255 

References . . . . . . • • • i, » 258 

Exercises . . , . . . . , . . 259 


11 SMALL SAMPLE DISTRIBUTIONS . . . . • v i 262 

11.1 Distribution of a Function of Raridpm Variables . . 

11.2 The Distribution . . . . . . • I • 

11.3 Applications of the Distribution ... . ; • 268 

11.4 Student’s r Distribution . . . . . • - • — 

11.5 Applications of the t Distribution , . , . . . 275 

11.6 The ^Distribution . . . . . . « : • 283 



CONTENTS 


1 1 .7 Applications of the F Distribution 285 

11.8 Distribution of the Range . . . . . . . 288 

11.9 Applications of the Range . . . . . . .291 

References . . 292 

Exercises 293 


12 STATISTICAL DESIGN IN EXPERIMENTS 297 

12.1 Randomization, Replication, and Sensitivity .... 297 

12.2 Analysis of Variance 299 

12.3 Stratified Sampling 315 

12.4 Sampling Inspection 318 

References .......... 325 

Exercises .......... 325 


13 NONPARAMETRIC METHODS 329 

13.1 Sign Test 330 

13.2 Rank Sum Test 333 

13.3 Runs . 335 

13.4 Serial Correlation 341 

13.5 Kolmogorov-Smirnov Statistic 345 

References 349 

Exercises . . . . . . . . . . 349 

14 OTHER METHODS 352 

14.1 Sequential Analysis 352 

14.2 Multiple Classification Techniques . . . . .362 

14.3 Bayes Techniques . 367 

References 372 

Exercises 373 

APPENDIX 1 375 

1 Properties of r . . . 375 

2 Likelihood Ratio Test for. Goodness of Fit , , , . 376 

3 Cramer-Rao Inequality ....... 379 

4 Transformations and Jacobians . . . . . .381 

5 Independence of and 5-^ for Normal Distributions . . .383 



CONTENTS 


XI 


APPENDIX 2 . . V 

Tables » . ... 

ANSWERS TO ODD-NUMBEMP MIECISES , 

INDEX . . . . V • 


. 387 

. 387 

413 

. 423 




C H A P T E R 1 


Introduction 


Statistical methods are essentially methods for dealing with data that 
have been obtained by a repetitive operation. For some sets of data, the 
operation that gave rise to the data is clearly of this repetitive type. This 
would be true, for example, of a set of diameters of a certain part in a 
mass-production manufacturing process or a set of percentages obtained 
from routine cheniical analyses. For other sets of data, the actual operation 
may not seem to be repetitive, but it may be possible to conceive of it as 
being so. This would be true for the ages at death of certain ihsurance- 
policy holders or for the total number of mistakes an experimehtal set of 
animals made the first time they ran a maze. 

Experience indicates that many repetitive operations or experiments 
behave as though they occurred under essentially stable circuihstances. 
Games of chance, such as coin tossing or dice rolling, usually exhibit this 
property. Many experiments and operations in the various branches of 
science and industry do likewise. Under such circumstances, it; is often 
possible to construct a satisfactory mathematical model of the repetitive 
operation. This model can then be employed to study properties of the 
operation and to draw conclusions concerning it. Although mathematical 
models are especially useful devices for studying real-life problems when 
the model is realistic of the actual operation involved, it often happens 
that such models prove useful even though the operation is npt highly 
stable. 

The mathematical model that a statistician selects for a repetitive 
operation is usually one that enables him to make predictions about the 
frequency with which certain results can be expected to occur When the 
operation is repeated a number of times. For example, the riiodel for 
studying the inheritance of color in the propagation of certaih flowers 
might be one that predicted three times as many flowers of one cplor as of 
another color. In the investigation of the quality of manufactured parts 
the model might be one that predicts the percentage of defective parts 
that can be expected in the manufacturing process. 

1 



2 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Because of the nature of statistical data and models, it is only natural 
that probability should be the fundamental tool in statistical theory. 
The statistician looks on probability as an idealization of the proportion 
of times that a certain result will occur in repeated trials of an experiment; 
consequently, a probability model is the type of mathematical model 
selected by him. Because probability is so important in the theory and 
applications of statistical methods, a brief introduction to probability is 
given before the study of statistical methods as such is taken up. 

The idea of a mathematical model for assisting in the solution of real- 
life problems is a familiar one in the various sciences. For example, a 
physicist studying projectile motion often assumes that the simple laws of 
mechanics yield a satisfactory model, in spite of the complexity of the 
actual problem. For more refined work, he introduces a more complicated 
model. Since a model is only an idealization of the actual situation, the 
conclusions derived from it can be relied on only to the extent that the 
model chosen is a sufficiently good approximation to the actual situation 
being studied. In any given problem, therefore, it is essential to be well 
acquainted with the field of application in order to know what models are 
likely to be realistic. This is just as true for statistical models as for models 
in the various branches of science. 

The science student will soon discover the similarity between certain of 
the statistical methods and certain scientific methods in which the scientist 
sets up a hypothesis, conducts an experiment, and then tests the hypothesis 
by means of his experimental data. Although statistical methods are 
applicable to all branches of science, they have been applied most actively 
in the biological and social sciences because the laboratory methods 
of the physical sciences have not been sufficiently broad to treat many of the 
problems of those other sciences. Problems in the biological and social 
sciences often involve undesired variables that cannot be controlled, as 
contrasted to the physical sciences in which such variables can often be 
controlled satisfactorily in the laboratory. Statistical theory is concerned 
not only with how to solve certain problems of the various sciences but also 
with how experiments in those sciences should be designed. Thus the 
science student should expect to learn statistical techniques to assist him 
in treating his experimental data and in designing his experiments in a more 
efficient manner. 

The theory of statistics can be treated as a branch of mathematics in 
which probability is the basic tool; however, since the theory developed 
from an attempt to solve real-life problems, much of it would not be fully 
appreciated if it were removed from such applications. Therefore the 
theory and the applications are considered simultaneously throughout this 
book, although the emphasis is on the theory. 



INTRODUCTION 3 

In the process of solving a real-life problem in statistics three steps may 
be recognized. First, a mathematical model is selected. Second, a check 
is made of the reasonableness of the model. Third, the proper conclusions 
are drawn from this model to solve the proposed problem. In tiiis book 
the emphasis is on the first and third steps. In order to do justice to the 
second step, it would be necessary to be well acquainted with the field of 
application. It would also be necessary to know how the conclusions are 
affected by changes in the assumptions necessary for the model. 

Students who have not had experience with apphed scISce^^S^^^ 
times disturbed by the readiness with which a statistician will accept 
certain of his model assumptions as being sufficiently well satisfied in a 
given problem to justify confidence in the validity of the conclusions. One 
of the striking features of much of statistical theory is that its field of 
application is much broader than the assumptions involved would seem 
to justify. The rapid development of, and interest in, statistical methods 
during the last few decades can be attributed in part to the highly successful 
application of statistical techniques to so many different branches of 
science and industry. 

REFERENCES 

A fuller discussion of some of the preceding ideas may be found in the follbwihg books : 

Cramer, H., The Elements of Probability Theory and Some of Its Applications, John 
Wiley and Sons, Chapters 1 and 2. 

Fisher, R. A. , Statistical Methods for Research Workers, Oliver and Boyd^ Chapter 1 . 

Kendall, M. G,, The Advanced Theory of Statistics,^^ and Co., pp." 164-166. 

Neyman, J., First Course in Probability and Statistics, Henry Holt and Co!:, pp. 1-6. 

Wilks, S. S., Mathematical Statistics, Princeton University Fress, pp .T~4. ' 



CHAPTER 2 


Probability 


2.1 Introduction 

An individual’s approach to probability depends on the nature of his 
interest in the subject. The pure mathematician usually prefers to treat 
probability from an axiomatic point of view, just as he does, say, the 
study of geometry. The applied statistician usually prefers to think of 
probability as the proportion of times that a certain event will occur if the 
experiment related to the event is repeated indefinitely. The approach to 
probability here is based on a blending of these two points of view. 

The statistician is usually interested in probability only as it pertains to 
the possible outcomes of experiments. Furthermore, he is interested in 
only those experiments that are repetitive in nature or that can be conceived 
of as being so. Experiments such as tossing a coin, counting the number of 
defective parts in a box of parts, or reading the daily temperature on a 
thermometer are examples of simple repetitive experiments. An experi- 
ment in which several experimental animals are fed different rations in an 
attempt to determine the relative growth properties of the rations may be 
performed only once with those same animals; nevertheless, the experi- 
ment may be thought of as the first in an unlimited number of similar 
experiments and therefore may be conceived of as being repetitive. 


2.2 Sample Space 

Consider a simple experiment such as tossing a coin. In this experiment 
there are but two possible outcomes, a head and a tail. It is convenient 
to represent the possible outcomes of such an experiment, and experi- 
ments in general, by points on a line or by points in higher dimensions. 
Here it would be convenient to represent a head by the point 1 on the x 
axis and a tail by the point 0. This choice is convenient because the 
number corresponds to the number of heads obtained in the toss. If the 

4 



PROBABILITY 


5 



Fig, 1. A simple sample space. 


experiment had consisted of tossing the coin twice, there would have been 
four possible outcomes, namely HH, HT, TIf, TT, For feasops;, of sym- 
metry, it would be desirable to represent these outcomes. by the points 
(I, 1), (1, 0), (0, 1), and (0, 0) in the x,y plane. Figure 1 illustrates this 
choice of points to represent the possible outcomes of the experiment. 

If the coin were tossed three times, it would be convenient to lise three 
dimensions to represent the possible experimental outcomes. This repre- 
sentation, of course, is merely a convenience, and if desired one qpuld just 
as well mark off any eight points on the x axis to represent the eight possible 
outcomes. 

Definition: The set of points representing the possible outcomes of an 
experiment is called the sample space, or the event space, of the experiment. 

The idea of a sample space is introduced because it is a convenient 
mathematical device for developing the theory of probability as it pertains 
to the outcomes of experiments. 

2.3 Sample Space Probabilities 

■ 

Experience shows that for some experiments one possible outcome is 
much more likely to occur than another possible outcome. For example, 
in counting the number of defective screws in a box of screws purchased 
from a reputable firm, one is much more likely to find all good screws than 
all defective screws. In many simple games of chance, however,;; it often 
happens that all the possible outcomes will occur about equally often in a 
large number of repetitions of the experiment. Thus, in tossirjg a die 
repeatedly, each of the six sides will usually occur with about tfie same 
frequency. 



b INTRODUCTION TO MATHEMATICAL STATISTICS 

Before it is possible to discuss the probability of some combination of 
possible experimental outcomes, it is necessary that probabilities be 
assigned to each of the sample points in the sample space. Since the 
interpretation of probability is going to be in terms of frequency, the 
probability that is assigned to a given sample point should be approxi- 
mately equal to the proportion of times that the sample point will be 
obtained, or is expected to be obtained, in a large number of repetitions 
of the experiment. This frequency interpretation of probability requires 
that probabilities be non-negative and that the sum of the probabilities 
assigned to the sample points be equal to one ; hence probabilities must 
be assigned with this restriction in mind. In the preceding illustration of 
tossing a coin twice, it would be natural to assign the probability of J to 
each of the four sample points, unless experience has indicated that the 
coin is biased, that is, that one side comes up more frequently than the 
other. The assignment of probabilities to each of the possible outcomes in 
sampling a box of screws for defectives would need to be based on experi- 
ence with the manufacturer’s product. From a mathematical point of 
view, any set of non-negative numbers totaling one may be assigned to 
the sample points as probabilities; however, the conclusions derived from 
the theory are not likely to prove very realistic unless the sample point 
probabilities are chosen in a realistic manner. The assignment of probabil- 
ities to the sample points constitutes the first step in the process of choosing 
a mathematical model for the real-life experiment under consideration. 

Since the development of the theory of probability is especially simple 
when there is only a finite number of sample points, it is assumed in the 
next few sections that the sample space is of this kind. Let the total 
number of sample points be denoted by n and let Pn 

probabilities assigned to the respective sample points. In most simple 
games of chance the /?’s are chosen to be equal from symmetry considera- 
tions. Thus in rolling a die one would naturally assign equal probabilities 
(^) to the six sample points that constitute the sample space. If experience 
with a particular die has shown that the six possible outconaes do not occur 
with approximately the same relative frequency, then a set of /j’s that is 
based on this experience should be assigned instead, provided this same 
die is to be used in future experiments. After the sample point probabilities 
have been assigned, one can begin to discuss the probability of events. 


2.4 Events 

Consider an experiment such that whatever the outcome of the experi- 
ment it can be decided whether an event A has occurred. This means 



PROBABIUTY 7 

that each sample point can be classified as one for which >4 will occur or as 
one for which A will not occurv Since the sample point probabilities give 
the expected relative frequency of occurrence of the corresponding out- 
comes, the sum of the sample point probabilities associated with A will 
give the expected relative frequency of occurrence of A, and therefore it 
should be called the probability of the occurrence of A, These considera- 
tions yield the following basic definition of probability for finite sample 
spaces: 

(1) Definition: The probability that an event A will occur is the sum of 
the probabilities of the sample points that are associated with the occurrence 

of 

■ ii ^ 

In symbols, if P{A] denotes the probability that A will occur when the 
experiment is performed, then 

( 2 ) P{A} = lPi 

A 

where the sum is over the values of the />’s for the sample points correspond- 
ing to the occurrence oi A, 

As an illustration, suppose a coin is tossed twice and suppose that all 
four sample points, as shown in Fig. 1, are assigned the same probability. 
Then the probability of getting a total of one head and one tail is f because 
the two sample points {H, T) and (J, H\ with associated probabilities of 
correspond to the occurrence of the desired event, •; 

As a second illustration, consider the experiment of rolling two dice. 
The sample space here consists of 36 points corresponding to the 36 
possible outcomes that are listed in Table 1. 


Table 1 


11 

21 

31 

41 

51 

61 

12 

22 

32 

42 

52 

62 

13 

23 

33 

43 

53 

63 

14 

24 

34 

44 

54 

64 

15 

25 

35 

45 

55 

65 

16 

26 

36 

46 

56 

66 


The first number of each pair denotes the number that came up on one 
of the dice and the second number denotes the number that came up on the 
other. It is assumed that the two dice are distinguishable or are rolled in 
order. The symmetric nature of dice, together with experience in rolling 
them, suggests that it is reasonable to assign the same probability ( 3 ^-) 
to all 36 sample points. Then the probability of getting a total of, say, 
seven points on the two dice is /g because the six sample points 16^ 25, 34, 
43, 52, 61 correspond to the occurrence of the desired 6 vent. 



8 


INTRODUCTION TO MATHEMATICAL STATISTICS 


As an illustration in which all sample points are not assigned the same 
probability, consider a pair of modified dice in which each one-spot has 
been changed to a two-spot. As a result, each die will possess two 2’s 
but no 1. In order to compensate for this alteration in Table 1, it is 
necessary merely to replace each 1 by a 2 in that table. The first two rows, 
and also the first two columns, will then become identical. If similar 
expressions are combined, the possible outcomes for this experiment are 
those listed in Table 2. 

Table 2 


22(4) 

32(2) 

42(2) 

52(2) 

62(2) 

23(2) 

33(1) 

43(1) 

53(1) 

63(1) 

24(2) 

34(1) 

44(1) 

54(1) 

64(1) 

25(2) 

35(1) 

45(1) 

55(1) 

65(1) 

26(2) 

36(1) 

46(1) 

56(1) 

66(1) 


The numbers in parentheses following the outcomes give the number of 
outcomes in Table 1 yielding corresponding outcomes in Table 2. Thus 
a (4) follows the outcome 22 because the events 11, 12, 21, and 22 of 
Table 1 all reduce to 22 when each 1 is replaced by a 2. In view of the 
earlier assumption that each of the 36 possible outcomes of Table 1 will 
occur with the same relative frequency, the natural probabilities to assign 
the possible outcomes listed in Table 2 are those obtained by multiplying 
3 ^ by the numbers in parentheses. 

Now if A is the event of getting a total of seven points in the experiment 
of rolling the two altered dice, it will follow from Table 2 that 

P{A) = 3% -f- /e + 3^ + 3% = |- 

because these numbers are the probabilities assigned to the four favorable 
outcomes, namely, 25, 34, 43, and 52. This result is the same as that of the 
earlier experiment of rolling two normal dice. If B is the event of getting a 
total of four points for the experiment of rolling the altered dice, then from 
Table 2 it is clear that 22 is the only favorable outcome, hence that 

This result is not the same as that obtained when two normal dice are 
rolled. From Table 1, the latter result is 


2.5 Addition Theorem 

Applications of probability are often concerned with a number of 
related events rather than with just one event. For simplicity, consider 



PROBABILITY 


9 


two such events, and A2, associated with an experiment. One may be 
interested in knowing whether both Ai and Aq will occur when tHe experi- 
ment is performed. This joint event will be denoted by the product 
and its probability by F{AiA2}. On the other hand, one may be interested 
in knowing whether at least one of the events A^ and A2 will occur when 
the experiment is performed. This event will be denoted by the sum Ai + 
A2 and its probability by F{Aj^ + ^2}* l^^st one of the two events will 
occur if Ai occurs but A2 does not, or if A2 occurs but Ai does hot, or if 
both Ai and A2 occur. The purpose of this section is to derive a formula 
for + A^}. 

Let the sample space for an experiment be represented by the points in 
Fig. 2 and let the sample points corresponding to the occurrence of A^ 
and A2 be the points interior to the regions labeled A^ and A2, respectively. 
The points common to these two regions determine a region that has been 
labeled y 4 iyi 2 This notation makes it clear that the region A1A2 is part of 
the region Ai and also part of the region A2. 

From definition ( 1 ), it follows that P{A^ + A^ is the sum of the prob- 
abilities for the sample points lying inside the two regions and A2 
combined. Now F{Ai} gives the sum of the probabilities for the points 
in Ai and F{A^ for the points in The quantity F{A^ + ^{^2} would 
therefore give the sum of the probabilities for points lying inside the two 
regions combined, except for the fact that the probabilities for points 
inside the common region ^ 1^42 would be summed twice. Since the latter 
sum is F{A-^A^, it is necessary to subtract this amount from the preceding 
sum before the correct answer can be obtained. These computations yield 
a fundamental theorem of probability known as the addition theorem. 
(3) Addition Theorem : 

F{A^ + A^=^F{A^ + F{A^--F{A^A 2 ^ y 

Two events A-^ and A2 often have no sample points in common. When 
this occurs, the events A^ and A2 are said to be mutually exclusive because 




10 


INTRODUCTION TO MATHEMATICAL STATISTICS 


if one of the events occurs the other cannot occur. Formula (3) then 
reduces to the following formula: 

(4) -f A 2 } = + P{A^ when A^ and A^ 

are mutually exclusive 

Formulas (3) and (4) can be generalized to more than two events. The 
generalization of (4) is obvious and is used in later work. The generaliza- 
tion of (3) is more complicated; however since the generalization is not 
needed in later work, it is not considered here. 


2.6 Multiplication Theorem 


The purpose of this section is to derive a formula for P{A-^A^ in terms 
of probabilities of single events. In order to do so, it is necesary to intro- 
duce the notion of conditional probability. Suppose that one is interested 
in knowing whether A^ will occur, subject to the condition that A^ is 
certain to occur. Since A^ is certain to occur only when the sample space is 
restricted to those points lying inside the region labeled A^ in Fig. 2, it is 
necessary to consider how probabilities should be assigned to the points 
of this new smaller sample space. If, originally, a sample point in A^ had 
been assigned, say, twice as large a probability as another point in A-^, 
then it should be assigned twice as large a probability in the new sample 
space also, because ignoring experimental outcomes that do not yield the 
event A-^^ should not alfect the two-to-one ratio of expected frequencies 
for those two sample points. It is merely necessary therefore to multiply 
the original probabilities assigned to points in A^ by a constant factor c 
such that the sum of the new probabilities will be one. Thus, if denotes 
the new probability corresponding to in the original assignment, one 
should choose ir^ = cp^ where 


1 = '2,'rTi = c^Pi = cP{A^ 

At At 


As a result, c = \jP{A-j}, and therefore 


( 5 ) 


= 


Pi 

P{Ai} 


Now that the new sample space has been determined, one can calculate 
probabilities in the usual manner by merely applying definition (1). All 
such probabilities will be conditional probabilities, subject to the occur- 
rence of A^ , If the probability that Ao, will occur, subject to the restriction 



PROBABILITY 


11 


that y4i is certain to occur, is denoted hy P{A^ | then it follo^ws from 
definition (1) and formula (5) that 


2 p. 

2 p-.-tn-; 

P{A^} 


The first sum is over those corresponding to sample points lyiiig inside 
A-^A^ because they are the only sample points inside A^ associated with the 
occurrence of ^ 3 , Since the numerator sum in the last expression is the 
one that defines P{A^A^, it follows that the formula for conditional 


probability reduces to 
( 6 ) = 


It is assumed here that Ai is an event for which F{A]} ^ 6 . This 
formula, when written in product form, yields the fundamental multiplica- 
tion theorem for probabilities. 

(7) Multiplication Theorem: P{A^A^ = P{A-^P{A^\ A-^}. 

Although formula ( 6 ) holds only when P{A^ ^ 6 , formula (7) may be 
treated as holding in general if it is agreed to give the right side tie value 
0 when the factor P{A^ is equal to 0. If the order of the two events is 
interchanged, formula (7) becomes 

(8) P{A,A^} = P{A,}P{A^\A,} 

Now, suppose that Ai and A^ are two events such that P{A^\ Ai} = 
jP{AJ and such that P{/ 4 i}P{A 2 } > 0. Then the event A^, is said to be 
independent in a probability sense, or more briefly, independent^ of the 
event Ai. This name follows from the property that the probability of A 2 
occurring is not affected by adding the condition that Ax must occur. 
When A 2 is independent of Ax, (7) reduces to 

(9) P{AxA2} = P{Ax}P{A2} ■ 

Conversely, when (9) is true, it follows from comparing (9) and (7) that ^3 
is independent of Ai, If the fight members of ( 8 ) and (9) are equated, it 
will be seen that | ^ 2 } = P{At]. But this states that the event is 
independent of the event A^, Thus, if ^2 is independent of A^ it follows 
that A^ must be independent of Because of this mutual independence 
and because ( 9 ) implies this independence, it is customary to define inde- 
pendence in the following manner: 

(10) Definition: Two events, A^ and A^, are said to be independent if 



12 INTRODUCTION TO MATHEMATICAL STATISTICS 

Formulas (7) and (10) can be generalized in an obvious manner for more 
than two events by always combining events into two groups. 


2.6.1 Illustrations 

As illustrations of the application of the preceding rules of probability, 
consider a few simple problems related to games of chance. From sym- 
metry considerations, it is usually assumed in such games that all possible 
outcomes should be assigned the same probability. This was done, for 
example, in discussing the probability of events in connection with Table 
1. It was not done in connection with Table 2 because symmetry was 
missing in the experiment that gave rise to Table 2. When all sample 
points are assigned the same probability, the computation of P{A] becomes 
especially simple because then it reduces to calculating the ratio of the 
number of sample points in A to the total number of sample points in the 
sample space. This follows directly from formula (2) because then 
Pi = l/« when the total number of sample points is w, and therefore the 
sum in (2) is equal to Ijn times the number of sample points in A, In the 
following illustrations it is assumed that symmetry is present and therefore 
that probabilities may be calculated in this simple manner. 

(a) If two dice are rolled, what is the probability of getting either a 
total of 7 or a total of 11 points? Let A^ and A 2 denote the events of 
getting a total of 7 and 1 1 points, respectively. Since these events are 
mutually exclusive, formula (4) may be applied. From Table 1 there are 
six satnple points giving rise to event A^ and two giving rise to A 2 ; con- 
sequently, under the symmetry assumption P{A-[] = - 3 % and P[A^ = 3 %. 
Formula (4) then yields 

"b A^ = + -3% = 3^- 

This result is, of course, the same as that obtained by counting favorable 
and total outcomes in Table 1 and applying definition ( 1 ) directly. 

{b) If two dice are rolled, what is the probability that each of them will 
show at least five points ? Let A-^ denote the event of getting a 5 or 6 on 
the first die and A 2 the event of getting a 5 or 6 on the second die. If the 
dice are rolled properly, events A^ and A 2 may be assumed to be inde- 
pendent; therefore formula (10) may be applied. Now one can treat the 
experiment of rolling two dice as composed of two consecutive indepen- 
dent experiments in which one die is rolled first and then the second die is 
rolled. From this point of view, the event A^ is concerned with the first 
experiment only for which there are six sample points, two of which 



PROBABILITY 13 

correspond to the occurrence of Ai. Under the symmetry assupiption, it 
therefore follows that P{A^ = f . The event plays the same role with 
respect to the second experiment as does in the first; hence P{A^ = |. 
Formula (10) now yields the desired result, namely, 

This result could also have been obtained directly by countiitg sample 
points in the sample space of Table 1 for the complete experiment. Since 
there are 36 sample points and the fpur sample ^p given by the out- 
comes 55, 56, 65, 66 correspond to the occurrence of AiA^, it follows from 
definitidn (1) that P{A-^A^ = -/e. The advantage of using formula (10) 
here is that it enables one to work with simpler sample spaces than the 
original sample space. The real purpose of these illustrations, however, is 
to develop familiarity with the formulas and not to simplify calculations, 
because in many problems the experimental sample space is not available 
for a direct application of definition (1). 

(c) Two cards are drawn from an ordinary deck of 52 cardis but the 
first card drawn is replaced before the second card is drawn. What is the 
probability that at least one of the cards will be a spade? Let hi denote 
the event of drawing a spade on the first draw and A^ the event of drawing 
a spade on the second draw. The problem then is to calculate P{A^ + A<^ 
by means of formula (3). As in the preceding illustration, the complete 
experiment can be broken down into two consecutive independent experi- 
ments. Here = Tf = ^ because A-^ is concerned with- the first 

drawing only and there are 52 sample points, 13 of which are ifavorable 
(spades), in that experiment. Similarly, P{A<^ == Because of the 
independence of A^ and A^^, it follows frbni formula (10) that p\A^A^ = 
4 ' 4 = iV Application of formula (3) now gives 

P{Ai -b A^ = i + 4 ^ = -/e 

This problem can also be solved indirectly by first calculating the prob- 
ability that neither card drawn will be a spade, which is |f • ff, and then 
subtracting this result from 1. The reasoning here is that the opposite of 
“neither card will be a spade” is “at least one card will be a spade.” Of 
course, one could also solve this problem by counting sample points and 
applying definition (1). The total number of sample points is 52* 52, 
whereas the favorable number is 52 • 52 — 39 • 39 because opiy those 
sample points corresponding to a pair of nonspades are unfavorable. 

{d) Two cards are drawn from a deck of cards. What is the pr'bbability 
that both cards will be spades? As in the preceding illustration, let A-^ 
and A^, denote the events of getting a spade on the first and second draw- 
ings, respectively. Since the first card drawn is not replaced before the 


rar IIBRARY ' 

CARNEfilE-MELLON tlNIVEliiHT 



14 INTRODUCTION TO MATHEMATICAL STATISTICS 

second drawing, these events are certainly not independent; therefore 
formula (7) must be used. As before, the complete experiment may be 
treated as composed of two consecutive experiments ; however, here they 
are not independent experiments. In the first experiment there are 52 
sample points, of which 13 correspond to the occurrence of A^; therefore 
. In calculating P(^ 2 1 ^i} necessary to consider only that 
part of the original sample space for which A^ is certain to occur. It will 
contain 13 * 51 sample points because to each of the 13 possible spades 
that may be obtained on the first drawing there are always 51 remaining 
cards that may be obtained on the second drawing. There are 13 • 12 
points in this sample space that correspond to the occurrence of A 2 , 
because to each of the 13 possible spades that may be obtained on the 
first drawing there are always 12 remaining spades that may be obtained 
on the second drawing to give 13 • 12 spade pairs. As a result, 1 ^ 1 } = 
13 • 1.2/13 * 51 = H. By using symmetry this computation could have 
been simplified considerably. Since the conditional probability of getting 
a spade on the second drawing, given that a space was obtained on the first 
drawing, should be equal to the corresponding probability when a partic- 
ular spade is known to have been obtained on the first drawing, one could 
just as well have worked with the sample space in which a particular spade 
is obtained on the first drawing. This reduced sample space contains only 
51 points, 12 of which are favorable. The calculation of P{A 2 | Aj} now 
becomes . The application of formula (7) can now be made and yields 
the result 

pf A \ — 1 3 . 1 .^ _ 

— 52 61 — 17 

Hereafter, formulas (4) and (7) will be applied without discussing the 
nature of the various sample spaces involved in the computations. Further- 
more, symmetry considerations such as those used to simplify the pre- 
ceding computations will be used whenever they are advantageous. 
If one is not certain in a given problem that his intuition is correct in 
choosing simple sample spaces, he should go back to the original sample 
space. 

(e) This last illustration is a somewhat more complicated exercise 
in the manipulation of formulas (4) and (7). One box contains two red 
balls. A second box of identical appearance contains one red and one 
white ball. If a box is selected by chance and one ball is drawn from it, 
what is the probability that the first box was the selected one, if the drawn 
ball turns out to be red? Let A^ denote the event of selecting the first 
box and A^ that of selecting the second box. Let A 2 denote the event of 
drawing a red ball and that of drawing a white ball. Then the problem 



PROBABILITY 


15 


is to calculate the conditional probability P{Ax | 
and ^2 (6) 


P{^|^2} = 


^{^ 2 } 


Interchanging 


The numerator probability may be calculated by using formula (7) 
directly, with the understanding that to select a box by chance means that 
the probability of selecting, say, the first box is Thus 

P{A,A^ = P{A^P{A^\A^=jn = k 'I 

The denominator probability may be calculated by considering the event 
A^ in conjunction with the selection of a box. Now >42 will occur if, and 
only if, one of the two mutually exclusive events A-^A^ and A^A^ occurs. 
Thus, by formula (4), 

(1 1) P{A^) = P{A^A^) + P{IM J 

But 

1 ^} = i • I = i 

Since P{^i> 42} was found to be equal to i, it follows from (llj and this 
last result that P{A^ = f , and hence that 

P{A^\A^}^\ = \ 


This problem was designed to give practice in the use of the two basic 
probability formulas; however, it could have been solved more simply by 
looking at the sample space and applying definition (!)• A sample space 
consisting of the four points II, 12, III, II2, would be quite natiiral here. 
The Roman numeral denotes the box number, and the Arabic; numeral 
the ball number. One would assign equal probabilities to tftese four 
points. The condition A 2 , restricts the sample space to the first thrfe sample 
points if bail number 2 in the second box is understood to be the white 
one. Thus each of these thrde Sample points must be assigned the prob- 
ability Now, the first two sample points correspond to the occurrence 
of >4^; hence P{/4i | > 42 } = |. • 


2.7 Bayes’ Formula 

Illustration {e) of the preceding section is typical of problems in which 
one looks at the outcome of an experiment and then asks for the prob- 
ability that the outcome was due to a particular one of the possible 
“causes” of the outcome. Thus in illustration (c) there are two possible 



16 


INTRODUCTION TO MATHEMATICAL STATISTICS 


causes, or ways, for a red ball to be obtained, and the problem is to cal- 
culate the probability that it was due to the first one. Although the solution 
to the problem was obtained by merely applying the two rules of proba- 
bility in the proper sequence, the computations are sufficiently extensive to 
make it worthwhile to derive a formula for treating such problems syste- 
matically. 

For the purpose of obtaining a formula, let the sample space of the 
experiment be divided into k mutually exclusive regions jF/g, • * • , Hj,, 
These regions represent the k possible causes of an experimental outcome 
which are of interest. Thus in Fig. 2 there are four such regions displayed — 
the three closed regions bounded by curves and the rest of the sample 
space. Next, let A be the event that occurred when the experiment was 
performed and consider the problem of calculating the probability that 
Hi was the cause of the occurrence of A, This means that the sample point 
was one of the points inside associated with the occurrence of A. From 
formula (6) this probability is given by 

(12) p{H,\A} = ^^^ 

But, 

(13) P{HiA} = P{Hi}P{A\ Hi} 

Now the event A can occur only in conjunction with one of the k possible 
events Hi, jFTg, • • * , H^. Thus A will occur if, and only if, one of the 
mutually exclusive events HiA, H^A, • • • , Hj^A occurs. The addition rule 
for mutually exclusive events therefore gives 

P{A) = P{H^A) + P{H^A) + • • • + P{H^A) 

If formula (13) is applied to each term on the right, one will obtain 

P{A}= 

i = l 

The substitution of this formula and formula (13) in (12) yields the desired 
formula for calculating probabilities of causes. The result may be sum- 
marized as follows: 

Bayes’ Formula: p{7f. | A} = I 

ip{H,]P{A\H,] 

i^l 

Illustration (^) of the preceding section, which was solved by the expedi- 
tious use of the two rules of probability, is solved here to illustrate the 
direct use of Bayes’ formula. Let H^ and H^ correspond to the events 



PROBABILITY 'J 17 

of getting box number 1 and box number 2, respectively, and let A be the 
event of getting a red ball. Since a box is selected by chance; P{Hj} 
Pli/g} = i. Further, it is clear from the contents of the two boxes that 
P{A I ifi} = 1 and P{A | Bayes’ formula then yields 


P{H^\A} = 


i-1 

^ ‘ 1 + i * i 


2 

3 


2.8 Combinatorial Formulas 

The simplest problems on which to develop facility in applying the 
addition and multiplication rules of probability are some of those related 
to games of chance. For many such problems, however, the cpunting of 
sample points corresponding to various events becomes tedious unless 
compact counting methods are developed. A few of the formulas that 
yield such methods are derived in this section. 

2.8.1 Permutations 

Consider a set of n different objects, such as n blocks havin| different 
numbers or colors. Let r of the « objects be selected and arranged in a line. 
Such an arrangement is called a permutation of the r objects. If two of the 
r objects are interchanged in their respective positions, a different per- 
mutation results. In order to count the total number of permutations, it 
suffices to consider the r positions on the line as fixed and then tount the 
number of ways in which blocks can be selected to be placed in the r 
positions. Starting from the position farthest to the left, any one of the 
n blocks may be chosen to fill this position. After the first position has 
been filled, there will be only w — 1 blocks left to choose from^^^t^ the 
second position. For each choice for the first position, there are therefore 
« — 1 choices for the second position; hence n{n — 1) total choices for 
the two positions. If this selection procedure is continued, there will be 
n — r + 1 blocks left to choose from for the rth position, the total 
number of such permutations is denoted by „P,., it therefore follows that 

(14) X = «(« - 1) ; - (n -/ + 1) 

The symbol ^P^ is usually called the number of permutations of* n things 
taken r at a time. 

As an illustration, suppose one is given the four letters a, b. c, d. The 
number of permutations of these four letters taken two at a time is given 



18 


INTRODUCTION TO MATHEMATICAL STATISTICS 


by 4 JP 2 = 4 • 3 = 12 . These permutations are easily enumerated as 
follows: ab, ba, ac, ca, ad, da, be, cb, bd, db, cd, dc. 

If r is chosen equal to n, (14) reduces to 

(15) = b(« - 1) • • • (1) = n! 

In order to permit formulas that involve factorials to be correct even when 
w = 0, it is necessary to define 0! = 1. This is consistent for « = 1 with 
the factorial property that (« — 1 )! = nljn. 


2.8.2 Combinations 


If one is interested only in what particular objects are selected when 
r objects are chosen from n objects, without regard to their arrangement 
in a line, then the unordered selection is called a combination. Thus, 
if two letters are chosen from the four letters a, b, c, d, the combination ab 
is the same combination as ba, but of course it differs from the combina- 
tion ac. The total number of combinations possible in selecting r objects 

from n different objects is denoted by the symbol j . This symbol is 

usually called the number of combinations of n things taken r at a time. 

In order to derive a formula for , it suffices to compare the total 

number of permutations and total number of combinations possible. 
Since a permutation is obtained by first selecting r objects and then 
arranging them in some order, whereas a combination is obtained by 
performing only the first step, it follows that the total number of permuta- 
tions is obtained by taking every possible combination, the total number 

of which is i^\, and arranging them in all possible ways. But from (15) 


the total number of arrangements of r objects in r places is r!; hence the 
total number of permutations is given by multiplying the number of com- 

binations, by r\. Thus • r!. Using formula (14), it there- 

fore follows that 

( 16 ) j = n(n - 1 ) - • (n -- r + 1) 

Since w(« — 1 ) •••(« — r + 1 ) = «!/(« — r)!, formula (16) may be 
written in the following more compact form: 

_ n\ 
r! (n — r)! 


( 17 ) 



PROBABIUTY 19 

As an illustration, the number of combinations of two letter^ selected 

from the four letters a, b, c, d is given by == 4!/2! 2! = 6. The actual 
combinations are ab, ac, ad, be, bd, cd. ^ ’ 


2.8.3 Permutations When Some Elements Are Alike 


In the preceding derivations it has been assumed that all the objects 
are different. It sometimes happens, however, that the n objects contain a 
number of similar objects. Thus one might have five colored balls of 
which three are white and two black, instead of five distinct coldrs. Now 
suppose that there are only k distinct kinds of objects and that there are 
of the first kind, «2 of the second kind, • • • , and of the fcth kipd, where 

«! + + * * * + The total number of different permutations of 

these n objects arranged in a line is obviously less than nl In order to find 
the total number of distinct permutations, it sulBces to compare the 
number of permutations now, which is denoted by P, with the number that 
would be obtained if the like objects were given marks to distinguish them. 


The comparison is similar to that made between and in deriving 

formula (16). Each permutation in the problem under consideration gives 
rise to additional permutations when the like objects are made different 
by markings. For example, if the similar objects in a permutation are 
made different, they can be rearranged in their positions in ways. 
Since this is true for each of the P permutations, there will be riif times as 
many permutations when the similar objects are made diferent as 
before. In the same manner, the Wg similar objects may be made different 
to give Wg! times as many permutations as before. Continuing this pro- 
cedure, the total number of permutations after all similar objects have 
been made different will be n^! Wg! • • ‘ %! times as large as the number 
of permutations before the similar objects were made different; lienee the 
total number after these changes will be Pw^! Wg! • • • L But’ after all 
similar objects have been made different, the total number of permuta- 
tions will be the number of permutations of « different things taken n at 
a time, which is w!. Equating these two results and solving for P, one 
obtains 


(18) 


n! 

"2! •••«*:! 


for the total number of permutations of n things in which there are 
alike, alike, • • * , alike. As an illustration, consider the number of 



20 


INTRODUCTION TO MATHEMATICAL STATISTICS 


permutations of the five letters a, a, a, b, b. Formula (18) yields 5 !/3 ! 2 ! = 
10. These permutations are easily written down: aaabb, aabab, abaah^ 
baaab, aabba, ababa, baaba, abbaa^ babaa, bbaaa. 


2.8.4 Illustrations of the Use of Combinatorial Formulas 

(a) Consider a bridge hand consisting of 13 cards chosen from an 
ordinary deck. What is the probability that such a hand will contain 
exactly seven spades ? Since a bridge hand is not concerned with the order 
in which the various cards are obtained, the total number of possible 
bridge hands is equal to the number of ways of choosing 13 objects from 

52 objects, or . This is therefore the total number of sample points 

in the sample space. The number of hands containing exactly seven spades 
is equal to the number of ways of choosing 7 spades from 13 spades, or 

I , multiplied by the number of ways of choosing 6 nonspades from 39 
^ / /39\ 

nonspades, or i ^ 1 . Hence the desired probability is given by 


13139! 13! 39! 
7! 6! 6! 33! 52! 


{b) What is the probability that a bridge hand will contain at most one 
ace ? The total number of bridge hands containing at most one ace con- 
sists of those with one ace and those with no ace. The number of hands with 

one ace is given by ^ j j ’ whereas the number with no ace is given by 

; consequently, the total number of favorable hands is 


0 © • (!!) 

Since the total number of possible bridge hands was found earlier to be 
|| j , the desired probability is given by the ratio 



PROBABILITY 


21 


(c) If you know that a bridge hand contains at most one ace; what is 
the probability that it contains no ace? The number of sample points in 
this sample space is given by the numerator of the preceding restilt. The 
number of those corresponding to the desired event, namely bridge hands 

with no ace, is given by | ; consequently, the desired probability is 
given by the ratio V v 



(d) If a coin is tossed five times, what is the probability that three heads 

and two tails will be obtained? First, consider a fixed order in which the 
desired result can occur, say iJJTi/Tr. From (10) the probability of 
obtaining this particular order of events is (|)^. Any other ordering of 
these three H's and two T's will have the same probability of being ob- 
tained. Next, consider the number of possible orderings. This ntimber is 
equal to the number of permutations of five letters, of which three are 
alike and two are alike, which by formula (18) is equal t^ = 10. 

Since the 10 orderings constitute the mutually exclusive ways in which t 
desired event can occur, formula (4) yields the desired answer, namely, 

10(i)^ = A. 

(e) A pair of coins is tossed 200 times. What is the probability that 
exactly x of the 200 tosses will show double heads? As in the preceding 
illustration, consider a fixed order in which the desired result cap occur, 

. S'V . 

X 200 — X 

FF--F 


where S denotes a success, that is, a double head, and F a failure, and 
where there are successes and 200 — a; failures. Because of tjie inde- 
pendence of the trials, the probability that this particular brderin| will be 
obtained is The number of such orderings is equa;l to the 

number of permutations of the S’s and jF’s, v^ich in turn is equlTt^ 
number of permutations of 200 things, of which a; are alike and 500 — or 
are alike. By formula (18), this number is 200 I/a;! (200 — x) \. Since these 
orderings constitute the mutually exclusive ways in which the^ deHred 
event can occur, it follows that the desired probability is given by 

200! /iy/3\200-a; 

a; 1(200 - a;)!W \4 


( 19 ) 



22 INTRODUCTION TO MATHEMATICAL STATISTICS 

2.9 Random Variables 

Consider a sample space corresponding to the tossing of two coins and 
suppose that interest is centered on the total number of heads that will be 
obtained* In order to study probabilities of such events, it is convenient 
to introduce a variable x to represent the total number of heads obtained. 
If the sample space suggested in 2.2 and displayed in Fig. 1 is used, the 
variable x will assume the value 0 at the sample point (0, 0), the value 1 
at the sample points (1, 0) and (0, 1), and the value 2 at the sample point 
(1, 1). A numerical-valued variable x such as this is an example of a 
random, or chance, variable. 

(20) Definition: A random variable is a numerical-valued variable 
defined on a sample space. 

As an illustration, if x denotes the sum of the points obtained in rolling 
two dice, then a; is a random variable that can assume integral values from 
2 to 12. The sample space here consists of 36 sample points. As another 
illustration, if four cards are drawn from a deck and if x denotes the 
number of black cards obtained, then a; is a random variable that can 
assume integral values from 0 to 4. The sample space here consists of 

sample points. 

The name random, or chance, is given to the variables in these illustra- 
tions because they are defined on sample spaces associated with physical 
experiments in which the outcome of any one experiment is uncertain and 
is therefore said to depend on chance. 

2.10 Frequency Functions 

After a random variable x has been defined on a sample space, interest 
usually centers on determining the probability that x will assume specified 
values in its range of possible values. From (1), the probability that x will 
assume a particular value, say x^, is equal to the sum of the probabilities 
for the sample points for which x — The relationship between the 
value of X and its probability is expressed by means of a function called 
the frequency function, which is defined as follows: 

(21) Definition: A function fix) that yields the probability that the 
random variable x will assume any particular value in its range is called the 
frequency function of the random variable x. 




PROBABILITY 


23 

A frequency function often consists of a table of valnei Thus, 

if two coins are tossed and if ^ denotes the total nu^^ 
it suffices to define f(x) by means of the following set of values: /(O) = 
/(l)=i,/(2 )-i. ^ 

In the following chapters, when explicit mathematical models arg selec^^^^ 
for experiments, several important frequency functions are given by means 
of formulas rather than by tables of values^ The function^^^^^^ (19) 

is an example of a frequency function defined by a formula. 

In order to judge quickly how a variable is distributed, that is^ how its 
probability changes as the variable changes, it is convenient to graph the 
frequency function /(ic) by means of a line graph. As an illustration of 
such a graph, let x denote the sum of the points obtained in rolling a pair 
of dice. Enumeration of cases and the use of Table 1 will sl^ow that 
/(2) =/(12) = 3^6, /(3) =/(ll) = 3^,/(4) =/{10) = 3^,/(5) =:=/(9) = 
8^6. /(6) ==/(8) = 3^6. and/(7) = 3^3. A line graph of/(x) is given in 
Fig. 3. 

A function closely related to the frequency function/(a;) is the distribution 
function F{x). It is defined by the relation 

(22) F(*) = 2 /(O 

t<X 

where the summation occurs oyer Ml of . the rgndqm^^ vari^^^^^^ 

that are less than or equal to the specified value of x. Thus F(a?o) gives the 
probability that the random variable jp will assu^ 
equal to x^, as contrasted to which gives the probability that x will 
assume the particular value %. The function F{^) is called the distribution 
function by pure mathematicians, but it is sometimes called the cumulad^^ 
distribution function by statisticians. The graph of F{x) for the dice 




24 


INTRODUCTION TO MATHEMATICAL STATISTICS 



1 2 3 4 5 6 7 8 9 10 11 12 13 


Fig. 4. Graph of a distribution function. 

illustration of the preceding paragraph is given in Fig. 4. It should be 
noted that the value of F{x) for x an integer is the upper value rather than 
the lower. 


2.11 Joint Frequency Functions 


Many experiments involve several random variables rather than just 
one. For simplicity, consider two random variables x and y, A mathe- 
matical model for these two variables is a function that gives the probability 
that X will assume a particular value while at the same time y will assume 
a particular value. A function f{x, y) that gives such probabilities is called 
a joint frequency function of the two random variables x and y. The 
adjective joint is usually omitted, since there is little possibility of confusing 
a function of two variables with a function of one variable. 


As an illustration, let x denote the number of spades obtained in drawing 
one card from an ordinary deck and let y denote the number of spades 
obtained in drawing a second card from the deck, without the first card 
being replaced. Then f{x, y) is defined by the following table of values: 
/(0,0) = ff-|f; = /(O, = and/(l,l) = 

sf * if* The graph of fix, y) as a line graph is given in Fig. 5. 

As a second illustration, let x and y denote the number of red and 
white balls, respectively, obtained in drawing two balls from a bag con- 
taining two red, two white, and two black balls. Here the joint frequency 
fuEotion i, given by ^ 

(23) /(.,,)= wWl2-»-yj 



PROBABILITY 


25 


This frequency function is defined by a formula; however, it could have 
been defined by means of a table of values, as in the first illustration. The 
numerator in (23) is obtained by realizing that the ii? fed bffls rnm^^ 
from the two red balls, the y white balls must come from the two whi|e balls, 
and the remaining 2 — (a; + y) balls must come from the two blaQk balls. 

In much of the statistical theory that is developed in the folldwirig^ 
chapters, the variables are unrelated in a probability sense. In fc first 
illustration the variables x and y would have been such variables if the 
first card drawn had been replaced before the second card was drawn. To 
say that variables are unrelated in a probability sense means that the 
probability of one of the variables assuming a particular value inde- 
pendent of the values the other variables assume. Random variables 
possessing this property are said to be independently distributed and are 
called independent random variables. In order to define indepehdehce 
more precisely, let f x^, • • • , x^) be the joint frequency function of the 
indicated variables and let denote the frequency function of the 
variable x^. The function/i(a;,) merely gives the probability distribution of 
the variable x^ when the remaining variables are ignored. 

The essential property of siich variables follows from the definition of 
independent events given by (10) and may be formalized in the following 
manner: 

(24) Definition: If the joint frequency function f(x^, ajg, * • * , x^ can be 
factored in the form f(xi, x^, • > • ,x„) ~ffxi)ffxz) ■ ■ •/„(«„), where ffx^ is 




26 


INTRODUCTION TO MATHEMATICAL STATISTICS 


the frequency function of then the random variables ‘ 

to be independently distributed. 

As an illustration, consider the first of the two preceding illustrations, 
modified to the extent that the first card drawn is replaced before the second 
card is drawn. The frequency function of x, which is denoted by ffx\ is 
given by the formula 



Since the first card is returned to the deck before the second drawing, the 
second drawing does not differ from the first drawing in properties; 
consequently, the frequency function of y, which is denoted by ^^(2/), is 
given by the same formula with x replaced by y. Since x and y are ob- 
viously independent here, 


/(*, y) = 




(?) (?) 




As an example in which (24) does not hold, consider the second of the 
two preceding illustrations for which the joint frequency function is given 
by (23), Ifyi(i^) denotes the frequency function of x alone, then 



Similarly, if f 2 (y) denotes the frequency function of y alone, then 



If (24) were to hold here, which means that f{x, y) as given by (23) would 
have to be equal to fi(x)f 2 {y\ then it would be necessary that 



PROBABILITY . .y... - 

Furthermore, this relationship would be required to hold for alji experi- 
mentally possible values of a; and j/. It obviously does not hold for x = \ 
and 2/ = 1 . As a matter of fact, it does not hold for a single pair of possible 
values. : 

Even though a joint frequency function of two variables may a:ppear to 
be the product of a function of a; alone and a function of y alone, it does 
not necessarily follow that the variables are independently distributed. 
A simple illustration of this fact can be obtained by modifying the problem 
that was just considered. As before, let x and y denote the number of red 
and white balls, respectively, obtained in drawing two balls from the bag, 
but now let the bag contain two red and two white balls only. Then the 
joint frequency function of a; and is given by 



This frequency function appears to factor properly for independence, but 
these variables are obviously not independent. As a matter of fact, y h 
completely determined by x and could be replaced by 2 — x in this formula. 
For independence it is necessary that the joint frequency function can be 
factored into the product of the individual frequency functions of the two 
variables. In this illustration (25) is the frequency function of ir alone if y 
is replaced by 2 ~ a;. Since, by symmetry, the frequency function of y alone 
is the same as that for x alone, it is clear that f{x, y) is not equal to the 
product of the individual frequency functions here and that it cannot be 
made so. 


2.12 Marginal and Conditional Distributions 

M the preceding section it was necessary to obtain the frequency func- 
tion^ of the individual variables before one could decide whe|her the 
variables in a joint frequency function were independent. Since it is 
important to know whether a set of variables is independent, it would be 
desirable to have a systematic way of finding the frequency fuhctiotts of the 
individual variables. Such a method is readily obtained by means of for- 
mula (7) for the case of two variables. Although one can easily extend the 
method so that it will apply to more than two variables, there is no need 
for the extension in later work; therefore the discussion is jimite| to two 
variables. 



28 INTRODUCTION TO MATHEMATICAL STATISTICS 

Consider an experiment for which Ai is the event that a random variable 
will assume the value x and A 2 is the event that a second random variable 
will assume the value y. The multiplication formula 

(26) P{AM = P{A,}P{A, I A,} 

can then be expressed in terms of frequency functions. Since now 

gives the probability that the two random variables will assume the values 
X and y, respectively, it is equivalent to f(x, y), the value of the joint 
frequency function at the point (x, y). Similarly, P{A-^ is the probability 
that the first variable will assume the value x\ therefore it is equivalent to 
f{x)^ the value of the frequency function of the first variable at the point 
x. Since P{A 2 1 A^ gives the probability that the second variable will 
assume the value y, given that the first variable is known to have the value 
X, it is equivalent to the value of a conditional frequency function, which 
is denoted hy f{y\ x). Formula (26) now becomes 

(27) =f(x)fiy\x) 

Since f{y\x) gives the conditional probability that the second random 
variable will assume the value y when the first random variable has the 
fixed value x^ the sum of/(^ | x) over all possible values of y for this fixed 
value of X must equal 1. Hence, if both sides of (27) are summed over all 
possible values of 2/, the formula for /(:c) given below in (28) will be 
obtained. In connection with the joint frequency function f{x, y), the 
function f(x) is called the x marginal frequency however, it is 

merely the frequency function of the first random variable. This result 
may be expressed as 

(28) Marginal Distribution: f{x) = v) 

y 

In a similar manner, the y marginal frequency function can be obtained, 
say g{y), by summing /(a;, y) over all values of x with y held fixed. Thus 
Siy) — 2/(^5 2/)* These results show that if one has the joint frequency 

X 

function of two random variables and if one desires the frequency function 
of one of them, it is merely necessary to sum the joint frequency function 
over all values of the other variable. 

The conditional frequency function f{y\x) gives the distribution of the 
second random variable when the first variable is held fixed. This distribu- 
tion is sometimes called the x array distribution of the joint distribution. 
Because of (27), if f{x) ^ 0, one may therefore write 

(29) Conditional Distribution: f(y a;) = ■ ■ 

fix) 



PjR.QBABILITV 


29 


The conditional distribution for a? with y held fixed is given by an analogous 
formula. This shows that if one has the joint frequency function of two 
variables and desires the conditional frequency function for one;of them 
when the other is held fixed, it is merely necessary to divide the joint fre- 
quency function by the frequency function of the fixed variable. 

For the purpose of illustrating the preceding ideas, suppose that a bag 
contains two white and four black balls and that two balls are drawn from 
the bag. Let x and y represent the results of the two drawings, 0 cor- 
responding to a black ball and 1 corresponding to a white ball; Then, 
every possible result will be represented by one of the four points in the 
x,y plane shown in Fig. 6; From the contents of the bag and the order in 
which the drawings are made, it follows directly from formula (27) that 

/(0,0)=/(0y(0|0)==|-f = A J 

/(0,l)=/(0)/’(l|0)==t-f= A 
/(l,0)=/(l)/-(0|l) = |-f - A 
/(l,l)=/(l)/(lll)=l-i = A 

The values of /(a:, y) have been graphed in Fig. 6 by means of a simple 
line chart. . 

In order to illustrate the method of obtaining a marginal distribution 
and a conditional distribution from the joint distribution, assume now 
that only the final values o{ f{x, y) just calculated are known. Thus the 
only information available is that given in Fig. 6. One should erase from 
his mind how these numbers were obtained. 


fixyy) 



Fig. 6. Theoretical distribution for two discrete variables. 



30 


INTRODUCTION TO MATHEMATICAL STATISTICS 


The a? marginal frequency function can be obtained by applying formula 
(28). Thus 

/(O) =/(0, 0) +/(0, 1) = A + A = I 

/(l) =/(l,0)+/(l,,l) = A + 


If the four points in the x,y plane are thought of as mass points whose 
total mass is 1, then the x marginal distribution represents the distribution 
of mass along the x axis after the points in the x,y plane have been pro- 
jected on the X axis. 

The conditional frequency function of y for x fixed can be obtained by 
applying formula (29) and using the results just obtained. Thus, if x is 
assigned the value rr = 1, 


/(O I 1) = 


/(l.O) 

/(I) 



4 

5 


/(I I 1) = 


m, 1 ) 

/(I) 


1 

15 


5 


Geometrically, f(y\ 1) represents the distribution of probability mass 
along the line x = I when the two points on this line have had their 
probability masses multiplied by a number, 1//(1), to make the sum of their 
masses equal 1. 

As a second illustration, in which the joint frequency function is given 
directly, consider the frequency function defined by the formula 

/(«, y) = + 2 / + 1) 


where x and y can assume only the integer values 0, 1, or 2. The sample 
space with its probabilities calculated by means of this formula is shown 
in Fig. 7. 

From formula (28), the marginal frequency function of x is given by 


/(^) = 2 + 2 ^+ 1 ) 

y 



Fig. 7. Sample space for f(oo,y) — (x y l)/27. 


PROBABILITY 

In carrying out the summation it is clear from Fig. 7 thatT/ may range over 
all its possible values regardless of the value of x that was fixe^ ; hence 

(30) /(*) = i + l) = K* + 2) 

By symmetry, the marginal frequency function of y will be the same as 
that for a;; hence : 

g{y) = Ky + 2) 

It is clear that f(x, y) is not equal to the product of its marginal frequency 
functions here and therefore that x and ?/ are not independent: random 
variables. ;; 

From formula (29) and the result given in (30), it follows that the con- 
ditional frequency function of y for a? fixed is given by 

f(y I a.'i = + y + 1) ^ x + y+\ 

^ K* + 2) 3(0! + 2) 

If a? is assigned the value 2, for example, 

f{y\2) = Uy + ^) 

This function would be useful if one wished to calculate probabilities for 
various values of y when it is known that x has the value 2. It can easily 
be checked that this is a probability function by summing the three proba- 
bilities obtained from this formula by letting = 0, 1, and 2 and yerifying 
that the sum is 1. 

It was a simple matter to find the marginal frequency function in the 
preceding problem because the sum over y in formula (28) was over all 
possible values of y regardless of the fixed value of x. The problem would 
have been somewhat more difficult if the frequency function bad been 
given by the formula 

= isC* + 2 / + 1) . 

and the sample space had been the one shown in Fig. 8. This sample 
space differs from the one in Fig. 7 in that y is not permitted to -exceed x 


2 

y 




1 

- 

•i8 

•il 



I 

18 

A 

3 , 

X 



1 

2 



Fig. 8. Sample space for /’(a?, y) = {x + y + 1)/18. 



32 


INTRODUCTION TO MATHEMATICAL STATISTICS 


in value. Now if one wanted the marginal value /(I), the sum in (28) 
would become 

/(I) = 2 iM?/ + 2) = 1% 

y = Q 

However, if one wanted the marginal value /(2), the sum would become 
/(2) = 2 + 3) = if 

y = 0 

Thus the values of y over which the sum is to be taken depend upon what 
marginal value of x is desired. Although the simplest procedure here is to 
perform the summation separately for each value of x, one can sum for a 
general x. Calculations for a general x will shdW that the marginal function 
can be expressed by means of the formula 

/(^) = + 1 ) 0 ^ + 2 ) 


2.13 Continuous Frequency Functions 

Thus far the discussion of probability has been confined to finite sample 
spaces. This simplification made it possible to derive the fundamental 
rules of probability in an elementary manner. It is assumed hereafter that 
these rules may also be applied to sample spaces in which there may be an 
infinite number of sample points. As an illustration of a problem for 
which this extension of the applicability of the rules of probability is 
needed, calculate the probability that the first head obtained in tossing a 
coin repeatedly will occur on or before the fourth toss. Here the sample 
space might conveniently consist of the infinite number of sample points 
represented by the infinite sequence of outcomes H, TH, TTH, TTTH, 
• • • . If it is assumed that the coin is not biased, the probabilities that 
would be assigned to these sample points are (i)^, (J)^, (|)^, • • • . It will 
be observed that the sum of these sample point probabilities is 1, as it 
should be. The random variable x here is a variable that can assume any 
one of the values 1, 2, 3, • • * , and the problem is to calculate the value of 

The random variable of the preceding illustration is an example of a 
discrete variable. A discrete random variable is a random variable that can 
assume only a finite number, or an infinite sequence, of distinct values. 
This means that the values can be arranged in a definite order. 

Although the extension of the applicability of the rules of probability 
as indicated above enables one to consider a much larger class of problems 
than before, there are many important classes of problems that are still 



PROBABILITY ;; 33 

not covered.. Th^se problems involve sample spaces that contain; all the 
points in an interval or interyals rather th^n just a discrete set of points. 
For example, suppose an experiment consists in the weighing of an adult 
male from the population of a given city. Although there is only a finite 
number of individuals in the city^ hence only a finite number of possible 
outcomes of the experiment, the mathematical model for such anj'experi- 
ment is much simpler if one conceives of M infinite number individuals 
and of all possible weights in some interval as being possible outcomes of 
the experiment. If the random yariable a;^^^^^^^^^ the weight of an indi- 
vidual is introdticed, then this assumes that i?? could take^^^^ value in 

the interval, say, 150 to 160 pounds. A random variable that may assume 
any value in some interval or intervals is ca^ 

variable. Such variables as weights, lengths, temperatures, and velocities, 
which involve measurement, are considered to be continuous. Although 
there are variables that are a mixture of the discrete and cpntinupijs types, 
the important problems in statistics usually involve either one or the Other; 
hence only these two distinct types are considered. 

For the purpose of discussing properties of continuous variables, 
consider a particular continuous random variable the 

thickness of a metal washer obtained frQm^a out 

washers. If the machine were permitted to turn out, say, 100 washers, and 
if the thicknesses of the^e- lQ0, w rneasured to the nearest .001 

inch, there would be available lOOvalues pf aj with which tp study the be- 
havior of the machine. If these 100 values were collected and represented 
in table form, one might find a table of values such as that displayed in 
Table 3 that gives the absolute frequency / with which various values of 

Table 3 


X 

.231 

.232 

,233 .234 

....,235 ,236... 

■237 


f 

1 

2 

8 18 

28 24 

13 

^ . 4....... ..2..!'. 


a; occurred. The word “frequency” usually implies the ratio of the observed 
number of values of to the total number of observ ational values ; how- 
ever, it is also used to denote the numerator of thi ratio. If throughout 
the subsequent chapters there is any question which meaning is beipg used, 
the words “relative frequency” and “absolute frequency” will be employed. 
Absolute frequencies are recorded in Table I, 

For the purpose of displaying these results graphically, a graph called a 
histogram is used. A histogram is a graph of the type shown in Pig. 9, in 
which areas are used to represent observed frequencies, particularly rela- 
tive frequencies. Thus the area of the rectangle that is centered at i ==; .234 
should equal the relative frequency .18; however, in practice it is customary 



34 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 9. Histogram for Table 3. 


to choose any convenient unit on the y axis, with the result that the areas 
of the rectangles may be only proportional to the corresponding fre- 
quencies rather than equal to them. The histogram shown in Fig. 9 for 
the data of Table 3 has been constructed with such a convenient choice of 
units; hence areas there are only proportional to frequencies. 

If the histogram is to be constructed so that areas will be equal to 
relative frequencies, then the total area of the histogram must equal 1 
because the sum of the relative frequencies must equal 1. If A denotes the 
distance between consecutive x values, the height of the rectangle centered 
at, say, will h^fjNh^ where/^ denotes the absolute frequency of This 
result is obvious when it is realized that this ordinate when multiplied by 
the base h must equal the relative frequency fjN, 

The histogram of Fig. 9 indicates the frequency with which various 
values of x were obtained for 100 runs of the experiment. If 200 runs had 
been made, the resulting histogram would have been twice as large as 
that based on 100 runs. In order to compare histograms based on dif- 
ferent numbers of experiments, it is necessary to choose units on the y 
axis, as discussed in the preceding paragraph, in such a manner that the 
area of the histogram will always be equal to one. With this choice of 
units, the histogram would be expected to approach a fixed histogram as 
the number of runs of the experiment is increased indefinitely. Further- 
more, if it is assumed that x can be measured as accurately as desired so 
that the unit on the x axis. A, can be made as small as desired, then the 
histogram would be expected to smooth out and approximate a continuous 
curve as the number of runs of the experiment is increased indefinitely 
and A is chosen very small. Such a curve is thought of as an idealization 



PROBABILITY 


35 


for the relative frequency with which different values of a? would be ex- 
pected to be obtained for runs of the actual experiment. 

When the area of the histogram is made equal to 1, it follows from the 
preceding discussion that the sum of the areas of several neighboring 
rectangles is equal to the relative frequency with which the value of x 
was observed to lie in the interval that forms the base of those r^tangles. 
Since this property will continue to hold as the number of runs of the 
experiment increases indefinitely, the area under the expected limiting, dr 
idealized, curve between any two given values of a; should be equal to the 
relative frequency with which x would be expected to lie in the interval 
determined by those values of x. The function /(a?) whose graph is con- 
ceived as being the limiting form of the histogram is treated as the mathe- 
matical model for the continuous random variable x and is called the 
frequency function of the variable. Since relative frequency in the case 
of a histogram is replaced by probability in the case of a mathematical 
model, the definition of a frequency function for a continuous variable 
may be stated in the following form : ” 

(31) Definition: A frequency function for a continuous random variable x 
is a function f{x) that possesses the following properties: 

(0 m>o 

Cco " 

(li) . f(x) dx = 1 

•^ — 00 

(iii) r f(x) dx = P{a < < h} 

where a and b are any two values of x, with a <,b. 


Property (i) is obviously necessary since negative probability has no 
meaning. Property (ii) corresponds to the requirement that the proba- 
bility of an event that is certain to occur should be equal to one. Here x 
is certain to assume some real value when an observation of it js made. 
Although it is certain to assume some value, the probability that it will 
assume a stated value is 0 for a continuous random variable. At first 
this may seem somewhat paradoxical, but if one wants the probability 
that x will assume some value in the interval from x^ to x^ + Aa?, it is given 


by the integral 



The mean value theorem of integral calculus may be applied here under 
the assumption that/(a;) is a continuous function to give the value 


Hsx f{x^ + 0 Aa?), 0 < 6 < 1 



36 


INTRODUCTION TO MATHEMATICAL STATISTICS 


But when Ax is allowed to approach 0, this probability will approach 0, 
and therefore the probability that x will assume the particular value Xq 
must certainly be 0. Thus, in dealing with continuous random variables, 
one asks only for the probability that the variable will lie in some interval 
or intervals. As a result, probabilities for continuous variables are always 
given by integrals, whereas those for discrete variables are given by sums. 
If the range of x is not the entire real line, it is assumed that f{x) is defined 
to be equal to 0 for those values outside the specified range of the 
variable. 

As an illustration, consider the possibility of using f(x) = ke~^ as a 
frequency function for x where k is some constant. From (i) it is clear that 
k must be positive. Since the integral of e~^ from — oo to + oo is infinite, it 
follows that the range of x must be restricted; hence assume, for example, 
that X can take on only non-negative values. Then f{x) will be defined to 
be 0 for negative values and to be given by the formula for non-negative 
values. From (ii) it then follows that k must be equal to 1 because the 
integral of e~^ from 0 to oo is equal to 1. The calculation of, say, P{1 < 

< 2} would then become 

1^' dx -1 - g-® = .23 

The graph of this frequency function and the representation of P{1 < 
a; < 2} as an area is given in Fig. 10. 

Although f{x) may be chosen at will in any given problem, a choice for 
which the resulting probabilities are not approximated well by observed 
relative frequencies is not likely to be a useful choice. As in the case of 
discrete variables, there are particular frequency functions that have 
proved very useful in statistical work and whose explicit formulas are 
considered later. 



Fig. 10. Graph of a frequency function for a continuous variable. 


PROBABILITY 


37 

The frequency function for a continuous variable is often called the 
probability density function, or density function, of the variable ; how- 
ever, it is very convenient, and becoming increasingly common, to use 
only the single name “frequency function” for both discrete and con- 
tinuous variables. 

The distribution function, F{x), for the continuous variable x is defined 

by 

(32) Fix) = J‘ fit) dt 

The graph of F{x) for the preceding illustration is given in Fig, 11. It 
should be noted that P{1 <x <2] is now given by F{2) — /’(I), that is, 
by the difference of the ordinates on the graph of F(x), Here the graph 
was constructed by first determining F{x) from definition (32) Thus 


F{x) = \ e * dt = \ — e > 0 

Jo 

~ 0, a; < 0 


The frequency function is the one commonly used in the applications 
of statistical theory; however, the distributipn function is also ve0 useful 
in deriving some of that theory/ For example, it is often easier to; find the 
distribution function of a random variable than it is to find the fTequency 
function. But after the distribution function has been found, the fre- 
quency function can be obtained by differentiating the distribution func- 
tion, since, by employing a familiar calculus formula for differentiating 
an integral with respect to its variable upper limit, it follows from (32) 


that 


dx 


This technique, of course, cannot be used on discrete variable listribu- 
tions. For such distributions it is necessary to take differences of F(x) 
values to obtain /(a:) values. 



Fig. 11. Graph of the distribution function for a continuous variable. 



38 


INTRODUCTION TO MATHEMATICAL STATISTICS 


2.14 Joint Continuous Frequency Functions 


A frequency function for several continuous variables is the natural gen- 
eralization of a frequency function for one variable. Thus a frequency 
function for two variables x and y would be denoted by f{x, y) and would 
be represented geometrically by a surface in three dimensions, just as a 
frequency function of one variable, f{x\ was represented by a curve in 
two dimensions. The volume under the surface lying above the rectangle 
determined hy a <x <. b and c <,y <d would give the probability that 
the random variables x and y will assume values corresponding to points 
lying inside this rectangle. The essential properties for a frequency func- 
tion of several variables may be formalized as follows: 

(33) Definition: A frequency function for n continuous random variables 
‘ ‘ 5 ^ function /(^i, ^ 2 ? ‘ ‘ ? ^n) possesses the following 

properties: 

(0 /(^l? ‘ ) ^n) ^ 0 

^CC /^oo 

(//) ■ • • fix^, x^, - , x„) dxi dx^ - ■ • dx„ = 1 

J '&n Z’&l 

• ■ • /(*!> * 2 . • • • , dx^ dx^ ■ ■ • dx^ 

an 

= P{ai < Xi < ■ ■ ■ , a„ < x„ < b„} 

As an illustration, consider the function f{x, y) = which is a 

two-dimensional generalization of the example used in the preceding 
section. If f{x, y) is defined to be zero for negative values of x and y, it will 
be observed that (i) and (ii) are satisfied. From (iii), the calculation of, 
say, P{1 <x< 2, 0<y< 2} will then be given by 


dx dy = (e~ 


e~^)(e^ - = .20 


The graph of f(x, y) and the representation of P{1 <a;<2, 0<?/<2} 
as a volume is given in Fig. 12. 

Continuous random variables that are unrelated in a probability sense 
are said to be independently distributed, just as in the case of discrete 
random variables. To say that continuous random variables are un- 
related in a probability sense means that the probability that one of the 
variables will assume a value in a given interval is independent of the 
values the other variables assume. In order that this property shall hold 
it suffices to define independence here exactly as it was done for discrete 
variables; hence definition (24) applies to continuous variables also. 



PROBABILITY 


39 



For the purpose of showing that the desired property holds, let f{x^, 

• * • , x^ be a frequency function satisfying (24). Then property (iii) of 
(33) implies that 

P{ai < % < • • • , < »„ < b„} 

flippy) • • •/«(*«) 1^*2 ■■■dx, 

J 'h rb. 

/i(*i) dxi I /^(x^) dxz • • • /„(x„) dx„ 

«•< ♦ 

= P{ai < Xi< bi}P{a2 < b^} ' ’ * P{a^ < x, <; bj 

This result states that the probability that the variables Xj^, • • ,'x„ will 
simultaneously satisfy the indicated inequalities is equal to the product 
of the probabilities of the individual variables satisfy^ these inequalities. 
This property is the analogue for continuous yariables^^^^ Q^^ (10) 

for events. " 

The frequency function whose graph is given in Fig. 12 is an illustration 
of a joint frequency function of two independent random variables. In 
the present notation fi(xj) == and f 2 (^ 2 ) = 

It should he noted that in Ayriting probability statements for continuous 
variables, such as in (33) (iii), it is irrelevant whether a/ < x^ < b^ or 
is used to determine the desired region because the integral 
is the same for the two cases. This property does not hold, however, for 
discrete yariables. 

By using integrals in place of sums, formulas can be derived for rnarginal 
and conditional distributions just as in the case of discrete yapables, 



40 INTRODUCTION TO MATHEMATICAL STATISTICS 

Since the derivations are somewhat sophisticated at this stage of the 
theory and are not needed for some time, they are postponed to a later 
chapter. 


REFERENCES 

A more extensive treatment of the various ideas and definitions of this chapter may be 
found in the following two books : 

Feller, W., An Introduction to Probability Theory and Its Applications, VoL 1, Second 
Edition, John Wiley and Sons. 

Parzen, E., Modern Probability Theory and Its Applications, John Wiley and Sons. 

EXERCISES 

1. A die has 2 of its sides painted red, 2 black, and 2 yellow. If the die is 
rolled twice, describe a 2-dimensional sample space for the experiment. What 
probabilities would you assign to the various sample points? 

2. A coin is tossed 3 times. Describe a 1 -dimensional sample space for the 
experiment. What probabilities would you assign to the various sample points? 

3. If the die in problem 1 is rolled until a red side comes up, describe a sample 
space for the experiment. What probabilities would you assign to the various 
sample points ? 

4. A box contains 3 white and 2 black balls. One ball is to be drawn from it. 
Describe a sample space and assign probabilities to the sample points when 

(a) the balls are also numbered, (Z?) like colored balls cannot be distinguished. 

5. A box contains 2 white and 2 black balls. Two balls are to be drawn from 
the box. Describe a 2-dimensional sample space and assign probabilities to the 
sample points when (a) the balls are also numbered, (h) like colored balls cannot 
be distinguished. 

6. A bag contains 3 white and 1 black ball. Two balls are to be drawn from 
it. Describe a 2-dimensional sample space consisting of 4 points based on color 
and draw and assign probabilities to the sample points. Would 3 points have 
sufficed ? 

7. Two balls are drawn from an urn containing 2 white, 3 black, and 4 green 
balls, {a) What is the probability that the first is white and the second is black? 

(b) What is this probability if the first ball is replaced before the second drawing ? 

8. One urn contains 2 white and 2 black balls; a second urn contains 2 white 
and 4 black balls, {a) If 1 ball is chosen from each urn, what is the probability 
that they will be the same color ? {b) If an urn is selected by chance and 1 ball 
is drawn from it, what is the probability that it will be a white ball ? (c) If an 
urn is selected by chance and 2 balls are drawn from it, what is the probability 
that they will be the same color ? 

9. Compare the chances of rolling a 4 with 1 die and rolling a total of 8 with 
2 dice. 

10. If 6 dice are rolled, what is the probability that each of the numbers 1 
through 6 will occur? 



PROBABILITY 


41 

11. Assuming that the ratio of male children is find the probability that in a 
family of 6 children {d) all children will be of the same sex; (b) the 4 oldest 
children will be boys and the 2 youngest will be girls ; (c) exactly half the children 
will be boys, 

12. Successive drawings of a card from an ordinary deck are m^de with 
replacement each time. How many drawings are necessary before the probability 
is at least | that an ace will be obtained at least once? 

13. Two boxes contain 1 black and 2 white balls and 2 black and 1 wliiite ball, 
respectively. One bail is transferred from the first to the second box, after which 
a ball is drawn from the second box. What is the probability that it is white ? 

14. A coin is tossed. If it comes up heads, a die is thrown and you 

the number showing in dollars. If it comes up tails, two dice are thfowh and 
you are paid in dollars the sum of the numbers showing. What is the pirobability 
that you will be paid at most four dollars ? 

15. A card is drawn from an ordinary deck. What is the probability that it is a 
king, given that it is a face card? 

16. Two dice are rolled. What is the probability that the sum of the faces 
exceeds 8, given that one (or more) of the faces is a 6? 

1 7. A box contains 2 red tickets numbered 1 and 2 and 2 green tickets njumbered 

1 and 2. If two tickets are drawn from the box, what is the probability that both 
will be red, given that one of them is known to be (a) red, (b) red ticket numbered 
i:?:; . . .. . 

18. A group of businessmen consists of 30 per cent Democrats and 70 per 
cent Republicans. If 20 per cent of the Democrats and 40 per cent of the Repub- 
licans smoke cigars, what is the probability that a cigar-smoking businessman is 
a Republican? 

19. A test for detecting cancer which appears promising has been developed. 
It was found that 98 per cent of the cancer patients in a large hospital reacted 
positively to the test, whereas only 4 per cent of those not having cancer did so. 
If 3 per cent of the patients in the hospital have cancer, what is the probability 
that a patient selected by chance who reacts positively to the test will actually 
have cancer? 

20. Each of 3 boxes has 2 drawers. One box contains a gold coin in each 
drawer, another contains a silver coin in each drawer, and the third contains a 
gold coin in one drawer and a silver coin in the other. A box is chosen, a drawer 
is opened and found to contain a gold coin. Whut is the probability that the 
coin in the other drawer is silver ? 

21. A, By and C in order toss a coin. The first one to throw a he^d wins. 
What are their respective chances of winning? Note that the game may continue 
indefinitely. 

22. Fourteen quarters and 1 five-dollar gold piece are in one purse and 15 
quarters are in another purse. Ten coins are taken from the first purse and 
placed in the second, and then 10 coins are taken from the second and placed 
in the first. How much money could you expect to get if you chose” the first 
purse ? Wyw much if you chose the second purse ? 



42 


INTRODUCTION TO MATHEMATICAL STATISTICS 


23. If a poker hand of 5 cards is drawn from a deck, what is the probability 
that it will contain 2 aces ? 

24. What is the probability that a bridge hand will contain 13 cards of the 
same suit? 

25. If a box contains 40 good and 10 defective fuses and 10 fuses are selected, 
what is the probability that all will be good? 

26. From a group of 50 people, 3 are to be chosen. Find the probability that 
none of 10 certain people in the group will be chosen. 

27. If the numbers 1, 2, • • • , « are arranged in order by chance, what is the 
probability that the numbers 1 and 2 will appear next to each other ? 

28. What is the probability that the bridge hands of north and south together 
contain exactly 3 aces ? 

29. If a bridge player and his partner have 9 spades between them, what is the 
probability that the 4 spades held by their opponents will be split two and two ? 

30. What is the probability that of 4 cards drawn from a deck 2 will be black 
and 2 red ? 

31. If you hold 3 tickets to a lottery for which n tickets were sold and 5 prizes 
are to be given, what is the probability that you will win at least 1 prize? 

32. Let X and y denote the respective number of heads obtained in tossing 2 
coins twice. Calculate the probability that y — x will be less than 1. 

33. A tosses 3 coins and B tosses 2 coins, simultaneously. The one with the 
greater number of heads wins, {a) What is the probability that A will win? 
(b) What is this probability if the experiment is repeated whenever a tie occurs? 

34. A bag contains 1 black ball and 2 white balls. A ball is drawn and replaced 
by a ball of the opposite color. Then another ball is drawn from the bag. Find 
the conditional probability that the first ball drawn was white, given that the 
second ball drawn was white. 

35. Find the probability that a poker hand of 5 cards will contain only black 
cards, (a) given that it contains at least 3 black cards, {h) given that it contains 
at least 3 spades. 

36. Find the probability that a poker hand (5 cards) contains no card smaller 
than 7, given that it contains at least 1 card over 10, where aces are considered 
as high cards. 

37. Three cards are drawn from an ordinary deck, {a) If it is known that the 
hand contains at least 2 aces, what is the probability that it contains 3 aces? 
{b) If it is known that the hand contains the 2 red aces, what is the probability 
that it contains 3 aces ? 

38. Show that (/j) +(") 

39. Given the discrete frequency function f{x) == e~'^lx\, x = 0, 1, 2, • • • , (a) 
calculate P{x = 2}; (b) calculate P{x < 2}; and (c) show that e~^ is the proper 
constant for this frequency function. 

40. A coin is tossed until a head appears, (a) What is the probability that a 
head will first appear on the third toss? (b) What is the probability f(x) that x 
tosses will be required to produce a head? (c) Graph the frequency function 



PROBABILITY 43 

41. If the probability is ^ that a finesse in bridge will be successful, fa) what 
is the probability that 3 out of 5 such finesses will be successful ? (6) What is 
the probability, /(a;), that £i? out of 5 such finesses will be successful? (c) Graph 
the frequency function /(ii:). 

42. Graph the distribution function F(x) for the frequency function obtained 
in problem 40. 

43. Graph the distribution function F(x) for the frequency function obtained 
in problem 41. 

44. Two dice are rolled. Let re be the difference of the face numbers s|li6wing, 
the higher minus the lower. Find the frequency function of x. 

45. A bojc contains 3 red and 2 black balls. Two balls are drawri fl’orn the 
box. Let X equal the number of red balls obtained. Find the frequency function 
of and also its distribution function. 

46. A die is tossed once. If a 4, 5, or 6 comes up, let x equal the mumber 
showing. If a 1, 2, or 3 comes up, toss the die again and let x equal the sum of 
the two numbers that came up. Find the frequency function of x, 

47. In the game of odd man wins, 3 people toss a coin. The game continues 
until someone has an outcome different from the other 2. The individj(ial with 
the different outcome wins. Let x equal the number of games needed before a 
decision is reached. Find the frequency function of 

48. There are N tickets numbered 1, 2, • • • , W, from which n are chosen. 
Let X equal the smallest number appearing on the tickets drawn, itind the 
frequency function of it?. 

49. Let X and y denote the number of heads obtained in tossing a coin twice. 
Find an expression for the frequency function /(a;, 2 /). 

50. Let X and y denote the number of heads obtained in tossing a pair of coins 
twice. Find an expression for the frequency function /(ic, 2 ^). 

51. Six dice are rolled. Let x denote the number of Ts and y the number of 

2’s that show. Find an expression for f {x, y), the probability of obtaining x I’s 
and t/ 2’s. ‘ 

52. Five cards are drawn from a deck. Let denote the number of a^fes and 2 / 
denote the number of kings that show. Find an expression for f{x^y), the 
probability of obtaining i*? aces and 2 / kings. 

53. For the first illustration in section 2.11, calculate the values of (a) /(I), 
(^»)^(0),(c)/(2/l 1),W^(^|0). 

54. For the distribution given by (23) in section 2.1 1, find expressions for the 
marginal and conditional distributions /(I) and f{y | 1). 

55- For the distribution of problem 49, find the marginal distribution fix) 
and the conditional distribution /( 2 / 1 a?). Comment. 

56. For the distribution of problem 50, find an expression for the conditional 
distribution 1 2 /). 

57. Calculate the marginal values /(I) and ^(3) for problem 51. 

58. Use a result from problem 57 to obtain an expression for the conditional 
distribution fiy\ 1) for the distribution of problem 51. 

59. Consider a deck of cards consisting of the ace, king, queen, and jack of 
each of the 4 suits. If 2 cards are drawn from this deck, and x and y denote the 



44 INTRODUCTION TO MATHEMATICAL STATISTICS 

number of spades and hearts obtained, find (a) the marginal distribution of x 
and (6) an expression for the conditional distribution of 2 / for a? = 1. 

60. Given f(x,y) = cxy at the points (1, 1), (2, 1), (2,2), (3, 1), and zero 
elsewhere, {a) evaluate c; {b) find /(a?); (c) find f{y | x), 

61. Explain why 2 variables x and y cannot be independently distributed, 
regardless of the nature of /(a?, y), if the region in the xy plane where f(x^ y) is 
positive is the triangular region of Fig. 8. 

62. Explain why 2 variables x and y cannot be independently distributed if 
the region in the xy plane where /(a:, y) is positive is not a rectangle (possibly 
infinite) with sides parallel to the coordinate axes. 

63. Given the continuous frequency function f(x) = cxe~^, a; > 0, {a) determine 
the proper value for c; (b) calculate P{x < 1}; and (c) calculate P{1 <x < 3}. 

64. Given the continuous frequency function /(a?) = c,0 <x ^2, (a) deter- 
mine the proper value for c; (b) calculate F{x < 1} ; and (c) calculate P{x > 1 ,5}. 

65. Find the distribution function F(x) and graph it if the frequency function 
of X is (a) fix) = 1, 0 ^ a? ^ 1 ; {b) fix) = a; for 0 ^ a; ^ 1 and fix) = -a? -f- 2 
for 1 < a; 2; and (c) fix) — [7r(l + a;2)]-i. 

66. If fix) — e~^, a? > 0, find a number x^ such that the probability is J that 
X will exceed Xq. 

67. Suppose the life in hours, x, of a type of radio tube has the frequency 
function fix) = cjx^, x > 500. id) Evaluate c. ib) Find the distribution function 
of X. (c) Calculate the probability that a tube will last at least 1000 hours. 

68. Suppose the probability that an atom of a radioactive material will dis- 
integrate in time t is given by 1 — where « is a constant depending on the 
material. Find the frequency function of x, the length of life for such an atom. 

69. If half the radioactive material of problem 68 will disintegrate in 1000 
units of time, calculate the probability that the life of an atom of this material 
will exceed 2000 units of time. 

70. Given the joint frequency function /(a;, y) == oi > 0, 2/^0, 

calculate P{a; < 1,2/ < 1}. 

71. Given the joint frequency function /(a?, y) = %xy, 0 ^x ^y ^x^ 

calculate id) P{x < .5, y < .25}; ib) P{x < .5}; and (c) P{y < .25}. id) From 
the preceding calculations, what conclusions can be made concerning the inde- 
pendence of the variables x and y ? 

72. If X and y are independent random variables with the same continuous 
distribution function P, find an expression for P{x < /, 2 / ^ /}. Use this to find 
the distribution function Giz) of the variable z = max {x, y), 

73 . If /(a:) = a: = 0, 1, 2, • • • , and/fy ] a:) = -pT~^,y = 

0, 1, 2, • • • , ic, show that the marginal frequency function of y is given by 
g(.y) = (ppye~'^’>ly\. 

74. Show that for the events P, C, the probability that at least one of the 
events will occur is given by P{A} + P{P} 4- P{C} ~ P{^P} — P{^C} - P{PC} + 
P{^PC}. 

75. Give an example of two random variables x and y that are not independent 
but such that x^ and 2 /^ are independent. 



CHAPTER 3 

Nature of Statistical Methods 


3.1 Mathematical Models 

The preceding two chapters have indicated to some extent the nature 
of statistical methods. The emphasis there was on experiments of the 
repetitive type, whether real or conceptual. Statisticians are mainly 
interested in constructing and applying mathematical models for experi- 
ments of this type. The advantage of such a model is that it enables the 
statistician to study properties of the experiment and to make predictions 
about the outcomes of future trials of the experiment, both of which 
would be difficult or impossible to do without such a model. 

The process of constructing a model on the basis of experimental data 
and drawing conclusions from it is an example of inductive inference. 
When it is applied to statistical problems, it is usually called statistical 
inference. Thus statisticians are principally engaged in making statistical 
inferences. 

Most often the statistician is interested in constructing a mathematical 
model for a random variable associated with an experiment rather than 
for the experiment itself. For example, if x represents the number of 
defective parts that will be found in a lot of lOD parts submitted for 
inspection, he would prefer to have a model that predicts the frequency 
with which the various values of x will be obtained rather than one that 
predicts the frequency with which the various possible experimetital out- 
comes will occur when 100 parts are selected from the production process. 
As a consequence, most of the models chosen by statisticians are fifequency 
functions of random variables. Statistical inferences are therefpr;^ usually 
inferences about frequency functions. 

As an illustration of the preceding ideas, suppose a biologist has ob- 
served that 44 out of 200 insects of a given type possess markings that are 
different from those of the rest. Suppose, further, that the biologist 
suspects that the markings are inherited according to a law which implies 
that 25 per cent of such insects would be expected to possess the less 



46 


INTRODUCTION TO MATHEMATICAL STATISTICS 


common markings. If he assumes that the inheritance law is operating 
here and lets x represent the number of insects out of 200 that will possess 
the less common markings, then the model that he would naturally select 
is the frequency function 


( 1 ) 


/(*) = 


200! 

X ! (200 — x ) ! 




This particular frequency function is the same as the frequency function 
given by (19), Chapter 2, because the two problems are equivalent from 
a probability point of view if the observations made on insects are con- 
sidered as independent trials of an experiment. 

If there had been no theory to suggest that J of such insects should 
possess the unusual markings, the biologist might have chosen this same 
frequency function with the probability J replaced by the observed rela- 
tive frequency .22. 

By means of (1) it would be possible for the biologist to make predictions 
about future sets of 200 observations and thus detect disagreements 
with his theory. 

In its most general formulation, statistical inference is a type of decision 
making based on probability. The statistician is largely engaged in con- 
structing methods for making decisions. In a more limited sense, however, 
a large share of the inferences, or decisions, made by statisticians fall 
into one of two categories. Either they involve the testing of some hy- 
pothesis about the frequency function selected as the model, or they involve 
the estimation of parameters or other characteristics of this frequency 
function. These two types of statistical inference will be studied briefly 
in the next two sections from a general point of view but are applied 
throughout the book and studied further in Chapter 9. 


3.2 Testing Hypotheses 

Since the variety of statistical hypotheses that occur in applications is 
very large, a fairly general definition of what constitutes a statistical 
hypothesis is needed. Such a definition is the following: 

(2) Definition : A statistical hypothesis is an assumption about the fre- 
quency function of a random variable. 

As an illustration for a discrete variable, consider the problem of the 
preceding section. If p denotes the proportion of all insects possessing 
the less common markings, then the assumption that /? = J is a statistical 
hypothesis. As an illustration for a continuous variable, suppose the 



NATURE OF STATISTICAL METHODS ■ 47 

randoin variable * represents the time that elapses between two successive 
trippings of a Geiger counter in studying cosmic radiation and suppose 
it is assumed that the frequency function for * is a Turictioh of tbe form 

(3) f 

where 0 is a parameter whose value depends on the experimental con- 
ditions. The assumption that the frequency function is a function of this 
particular form is obviously a statistical hypothesis. If it is asstmdbd that 
the parameter 6 is equal to 2, then this assumption is also a statistical 
hypothesis. 

Now consider whai is meant by testing a statistical hypothS A 
general definition can be expressed in the following form : 

(4) Definition : A test of a statistical hypothesis is a procedure for 
deciding whether to accept or reject the hypothesis. 

This definition permits the statistician unlimited freedom in designing 
a test ; however, he will obviously be guided by its desirable projperties. 
Thus a simple but ordinarily useless test is one in which a coin is tossed 
and it is agreed to accept the hypothesis in question if, and only if, the 
coin turns up a head. 

In order to illustrate how the statistician proceeds in attempting to 
design a test that possesses desirable properties, consider a probiem re- 
lated to the frequency function (3). Suppose a physicist is certain, from 
theoretical or experimental considerations, that the time that ‘elapses 
between two successive trippings on a counter possesses the frequency 
function (3). Suppose further that he is quite certain that for the material 
with which he is working the value of the parameter is either 2 or 1, with 
his intuition favoring the value 2. To assist him in making a choice, the 
statistician might proceed in the following manner. 

Assume that the frequency function (3) applies. Although this assump- 
tion constitutes a statistical hypothesis, it will not be tested here because 
the physicist is quite certain of the validity of this assumption. Assume 
that the parameter 6 has the value 2. This assumption is the statistical 
hypothesis to be tested. Denote this hypothesis by Hq, Let denote 
the alternative hypothesis that 0 = 1. Thus the problem is one of testing 
the hypothesis against the single alternative 

To test ^ single observation is made on the random variable a;; 
that is, a single time interval between two successive trippings of the 
counter is measured. In real-life problems one usually takes several 
observations, but to avoid complicating the discussion at this stage only 
one observation is taken here. On the basis of the value of x obtained, a 
decision will be made either to accept or to reject it. The latter decision. 



48 


INTRODUCTION TO MATHEMATICAL STATISTICS 


of course, is equivalent to accepting The problem then is to deter- 
mine >vhat values of x should be selected for accepting and what 
values for rejecting If a choice has been made of the values of x that 
will correspond to rejection, then the remaining values of x will necessarily 
correspond to acceptance. It is customary to call the rejection values the 
critical region of the test. For this problem, the sample space may be 
considered as the positive half of the x axis. Every possible outcome can 
be represented by a point on this line with its x coordinate giving the value 
of the associated random variable x. Since only one observation is being 
made here, the sample space is one dimensional. If n observations were 
to be taken, the corresponding sample space would be n dimensional, with 
one coordinate axis for each observation. In order to have a definition 
of the critical region that is applicable to more general sample spaces, it 
is formulated as follows: 

(5) Definition: The critical region of a test of a statistical hypothesis 
is that part of the sample space that corresponds to the rejection of the 
hypothesis being tested. 

In terms of the foregoing language, the problem of constructing a test 
of for the problem under discussion is therefore the problem of choosing 
a critical region for the test. 


3.2.1 Two Types of Error 

Now suppose that the statistician arbitrarily selects the part of the x 
axis to the right of ir = 1 as the critical region. To decide whether this 
was a wise choice, consider its consequences. If Hq is actually true and 
the observed value of x exceeds 1, will be rejected because it has been 
agreed to reject Hq whenever the sample point falls in the critical region. 
This, of course, is an incorrect decision. This kind of error is called the 
type I error. On the other hand, if is actually true and the observed 
value of X does not exceed 1, Hq will be accepted. This also is an incorrect 



Table 1 



Hq True 

Hi True 

X > 1 

(reject Hq) 

Type I 
error 

Correct 

decision 

X ^ 1 

(accept Hq) 

Correct 

decision 

Type II 
error 


NATURE OF STATISTICAL METHOPS 


49 


decision. This kind of error is called the type II error. These two in- 
correct decisions, as well as the two correct decisions that are:' possible 
here, are displayed in Table 1. 

it is necessary to measure in some way the seriousness of making either 
one of these errors before one can judge whether the choice of a critical 
region was wise. This can be accomplished by using what is known as 
the size of an error as the measure of its seriousness. 

(6) Definition: The size of the type I error is the probability that the 
sample point will fall in the critical region when Hq is true; the size of the 
type 11 error is the probability that the sample point will fall in the non- 
critical region when Hx is true. 

Now, in terms of the sizes of the two types of error, it is possible to 
introduce a simple principle to follow in determining good tests of hy- 
potheses. It may be expressed as follows. 

(7) Principle: Among all tests possessing the same size type I error ^ 
choose one for which the size of the type II error is as small a^ possible. 

Other principles can easily be suggested: for example, mininiizing the 
sum of the sizes of the two types of error. However, principle (7) has 
proved to be very useful in constructing tests. A statistician often deter- 
mines in advance what size type I error he will tolerate. Then if the 
number of runs of his experiment is fixed, he will attempt to construct his 
test to minimize the size of the type II error. For a fixed number of runs 
of an experiment, the size of the type II error will usually increase if the 
size of the type T error is decreased; hence one cannot make the type 
I error as small as desired without paying for an increasingly large type II 
error. In real-life experiments it is often necessary to adjust the type I 
error until a satisfactory balance has been reached between the sizes of 
the two errors. The type I and type 11 error sizes are usually denoted by 
the letters a and respectively. For the sake of avoiding lengthy dis- 
cussions regarding the practical consequences of possible choices for the 
sizes of these two errors, a convention of almost always choosing the size 
of the type! error as .05 is adopted. This means that approximafely 5 per 
cent of the time true hypotheses being tested will be rejected. The value 
of a = .05 is quite arbitrary here and some other value could have been 
agreed on ; however, it is the value of a most commonly used by applied 
statisticians. In any applied problem one can calculate the value of and 
then adjust the value of a if the value of /? is unsatisfactory when a = .05. 
This works both ways, of course. For a very large experiment, with a fixed 
at .05, it might turn out that would be considerably smaller thain .05. If 
the type I error were considered more serious than a type II error, one 



50 


INTRODUCTION TO MATHEMATICAL STATISTICS 


would need to adjust the test to make a smaller than which would then 
make a smaller than .05. 

Now consider the problem under discussion from the point of view of 
this principle. If the sizes of the two types of error for the selected critical 
region, namely x> 1, are denoted by a and respectively, then, because 
the two competing hypotheses here are //q : 0 = 2 and : 6 = 1, it follows 
from (3) and (6) that 


and 



dx = .135 



e-^dx = .632 


Since probabilities correspond to areas under graphs of frequency func- 
tions, these values may be represented geometrically as indicated in 
Fig. 1. 

In order to decide whether the preceding test, that is, the choice of a 
critical region, was a good one, it follows from (7) that it is necessary to 
compare this test with other tests for which oc = .135. Here only one 
other test is considered as a competitor, namely, the test that uses the 
left “tail” rather than the right “tail” of the graph of the frequency func- 
tion under Hq as the critical region. Thus the critical region for the com- 
peting test consists of the part of the axis to the left of the point x^ where 
Xq is such that 



dx = .135 



Fig. 1. Graphs showing sizes of two types of error. 



NATURE OF STATISTICAE METhOTS 


51 


fix) 



.07 1 2 3 4 


Fig. 2. Graphs for a competing test. 

If the integration is performed and tables of exponentials are consulted, 
it will readily be found that % == .07. From (6), it then follows that 

= r e'^dx=- .932 

•^.07 

Graphs showing the sizes of the two types of error for the competing test 
are given in Fig. 2. 

It is clear from comparing the two values of that the first test is 
superior to the second. The second test would incorrectly reject 
93 per cent of the time, whereas the first test would do so only 63 per cent 
of the time. Both tests have very large type IF errors, but this is to be 
expected when only one observation is taken. By using methods that are 
developed in Chapter 9, it can be shown that the first test selected is the 
best test that can be constructed for this problem according to principle 

These principles of test construction apply to discrete variable problems 
also. As a simple illustration of how to make discrete variable coimputa- 
tions, consider the foilowing academic problem. A coin is known to be 
either an honest coin or one that yields twice as many heads as tails. A 
decision is to be made as to which type of coin it is by tossing it three times 
and observing the number of heads, x, that result. 

The problem may be formalized by choosing 

and Hx '.p ^ 

Here ji; denotes the probability that a head will be obtained when the coin 
is tossed once. Since the coin is to be tossed three times, the random 
variable ir can assume only the values 0, 1, 2, or 3. The four points on the 



52 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Fig. 3. Ho probabilities. 

X axis corresponding to these values may be chosen as the sample space 
here, even though the natural sample space for the experiment of tossing 
three coins would be one consisting of eight points in three dimensions. 
If one is interested only in the number of heads that turn up, then the 
four-point sample space is more convenient to work with than the original 
sample space. When Hq is true, the probabilities that should be assigned 
to the four sample points in this space are those displayed in Fig. 3. These 
values were calculated in the manner illustrated in exercise (d) of 2,8.4. 

Now consider two different choices for the critical region of this test, 
namely, the point x = 0 and the point x = 3. Except for convenience, 
these two parts of the sample space were chosen quite arbitrarily. They 
should serve to illustrate techniques and principles in test construction 
for discrete variables. From Fig. 3 it will be seen that both of these critical 
regions yield a type I error of size a = ^; hence they are equally good as 
far as making type I errors is concerned. 

When FTi is true, calculations similar to those employed in illustrations 
(d) and (e) of 2,8.4 with /? = | will show that the probabilities that should 
be assigned to the four sample points are those listed in Fig. 4. Now 
when X = 0 is chosen as the critical region for the test, the size of the 
type II error is equal to = ff because that is the probability that x 
will not assume the value 0. On the other hand, when a; = 3 is chosen as 
the critical region, the value of ^ is because that is the probability 
that X will not assume the value 3. Thus it is clear that a; = 3 is a better 
critical region than rr = 0 for testing Hq against Hi, 

In discussing critical regions, it is customary to call them critical 
regions of size a; if the magnitude of the type I error is a. Thus in the pre- 
ceding illustration the two competing critical regions there were of size 
One difficulty in applying these methods to discrete variable problems 
is that critical regions of specified sizes cannot always be chosen without 
resorting to other devices. In the preceding illustration, for example, 
one cannot directly choose a critical region of size a = This difficulty 
is seldom much of a problem in real-life applications because then experi- 
ments are usually sufficiently large to permit a wide choice of sizes for 

A i. ii ^ 

27 27 27 27 

• • • • 

0 12 3 

Fig. 4. Hj probabilities. 



NATURE' OF-'-'-STATim^^ 

the type I error. Moreover, there are techniques available, AvHich are 
discussed in a later chapter, that enable one to construct critical regions 
of any desired size, even for problems such as the preceding one. 

In the following chapters tests of hypotheses are made without being 
concemed whether the critical region selected is the best possible; how- 
ever, after Chapter 9 has been studied it will be found that the jtests in 
the earlier chapters were well chosen from this point of view. For the 
simpler problems, the critical region that the experimenter carefully selects 
on an intuitive basis is likely to be a good one from the point of view of 
principle (7). 

3.2.2 Power Function 

The problem considered in the preceding sections to illustrate the general 
methods for selecting a good test was easy to discuss largely because there 
was only a single alternative Jf/i to the hypothesis //q being tested. Most 
problems, however, involve more than a single alternative. For example, 
if one were interested in knowing whether the proportion of defective 
parts in a manufacturing process is increasing, he might wish to |est the 
hypothesis i/o :/? = Pq, in which is the proportion of defectives found 
in the past, against the hypothesis H^ :p> p^. For the first problem 
discussed in 3.2.1, it might well be that the physicist would have preferred 
to test the hypothesis i/g • ® = 2 against the alternative \ d < 2 rather 
than against the alternative = 1. The experimenter offen has 

theoretical or empirical reasons for knowing what value of the patameter 
to test, but he seldom knows what particular alternative value will hold 
if 7^ is false. 

For such more general classes of alternatives, the size of the type II 
error will depend on the particular alternative value of 0 being con- 
sidered. In order to determine how good the chosen test may be, compared 
to a competing test, it is therefore necessary to compare the type II errors 
for all possible alternative values of 6 rather than for just one alternative 
value as before. For this purpose, it is necessary to consider the calcula- 
tion of the size of the type II error as a function of 0. The size of this error 
is denoted by jS(0). Now, from (6), ^(0) is the probability that the $ample 
point will fall in the noncritical region when 0 is the true value of the pa- 
rameter. It is usually more convenient to work exclusively with the critical 
region; therefore it is customary to calculate 1 — /?(0), which is the 
probability that the sample point will fall in the critical when 0 

is the true value of the parameter. The function 1 — /S(0) is called the 
power function and may be defined formally as follows. 



54 


INTRODUCTION TO MATHEMATICAL STATISTICS 


(8) Definition : The power function P{d) of a test is the function of the 
parameter that gives the probability that the sample point will fall in the 
critical region of the test when d is the true value of the parameter. 

Since P(0) = 1 — /?(0), seeking for a test that minimizes the type II 
error /S(6) is equivalent to seeking for one that maximizes the power P{B). 

The problems that were considered in the preceding section are used 
to illustrate how the power function can assist one in selecting good tests 
when there is more than a single alternative value of the parameter. For 
the first illustration, let the hypothesis to be tested be JTq : 0 = 2, as before, 
but let the alternative hypothesis now be : 0 < 2 rather than : 0 = 1. 
As before, let x> I and x < .07 be the respective critical regions of size 
a == .135 for the two competing tests, and let PfB) and PfO) denote the 
power functions for the two tests. From (8), the power functions of these 
tests are given by integrating the frequency function (3) over the respective 
critical regions ; hence 



The graphs of Pi(0) and P^iB), which are called the power curves for 
the two tests, are shown in Fig. 5. These curves must intersect at the point 
(2, .135) because the power function gives the probability that the sample 
point will fall in the critical region and this probability has been chosen 
equal to a = .135 when /Tq : 0 = 2 is true. Since the power curve for the 
first test lies above the power curve for the second test for all values of 
0 < 2 and the only alternative values permitted in the problem are those 
given by : 0 < 2, it follows that the first test is superior to the second 
for the problem under discussion. 



Fig. 5. Two competing power curves. 



NATURE OF STATISTICAL METHODS ;; 55 

By means of a theorem that will be studied in Chapter 9, it can be 
shown that any test whose critical region is of size a = .135 will yield a 
power curve that nowhere lies above the power curve for this lirst test; 
consequently, this test is the best possible for the problem und|r discus- 
sion. It was stated in an earlier section that this is the best possible test 
for the alternative : 6 = I, but now a stronger statement is being made, 
namely, that this test is the best possible for every alternative value of 6 
satisfying the inequality 0 < 2 . 

Not only is the power function useful for assisting one in cbmparing 
tests and finding best tests when more than one alternative valjae of the 
parameter exists, but it is also useful for determining the effectiveness 
of a given test for making the correct decision as a function of the param- 
eter value. For example, the power function ^ 1 ( 0 ) given by (9) shows 
that the probability is ,37 of making the correct decision, namely, rejecting 
Hq when 0 = I, and that this probability rises to .61 when 0 = By 
studying the power function, or power curve, of a test the experimenter 
can determine his chances of detecting various possible alternative values 
of the parameter that may occur and thus determine whether his experi- 
ment is large enough to give him the confidence that he would like in 
whatever decision will be made by the test. 

For the second illustration of the preceding section, consider; the con- 
struction of the power function for the better of the two tests discussed 
there. The construction for the other test would be similar, lince the 
critical region consists of the point i = 3, it is necessary to calculate the 
probability that x will assume the value 3, given that the probability of a 
head in a single toss of the coin is /?. But this is merely the probability of 
getting three heads in three tosses of the coin. Since the tosses are inde- 
pendent, this probability is consequently, the power function here is 
given by 

^(py=f I 

The graph of this power function is shown in Fig. 6. It is clear ^rom Fig. 
6 that the critical region n: == 3 is a poor one if the coin is biased; in favor 
of tails. For example, if p were equal to the probability of rejecting 
the incorrect hypothesis that the coin is honest is only 3 - 7 . Tbe test is 
really not much good unless the alternative value of p is close to 1. Thus, 
if were equal to |, the probability of rejecting Hq would rise to i-g-s-. In 
order to obtain a good test here, it would be necessary to toss the coin 
considerably more than three times. 

The preceding material on how statistical hypotheses are set up and 
tested may appear somewhat artificial to someone experienced with real- 
life problems in testing hypotheses. Very often one has no precise value 



56 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 6 . A power curve for testing a coin. 


6q of a parameter d to test but only an approximate value based on 
experience. If this approximate value is treated as the precise value 6q 
to be tested and the test accepts the hypothesis Hq, this does not mean that 
one believes Oq is the true value of 6. Rather, it means that the true value 
of d is probably in the neighborhood of Oq and that from a practical point 
of view it is safe to treat 6^ as the true value. The practical conclusions to 
be drawn from the test of a statistical hypothesis are by no means the same 
as the statistical conclusions obtained by following the procedure outlined 
above for testing a precise hypothesis. Further information on the prac- 
tical interpretation of statistical methods is given in the following chapters 
in the applications of the theory. The next section takes up the problem of 
determining how close a Bq based on experience is likely to be to the true 
value of 6, 

Although this book is concerned principally with methods for testing 
hypotheses and estimating parameters of frequency functions, there are 
many problems that cannot be treated adequately by means of such 
methods. For example, a decision-making problem may involve choosing 
one of three possible decisions rather than one of two. One would there- 
fore like to know how best to proceed in making the choice. Some of 
these methods are discussed in a later chapter. 


3.3 Estimation 

Most of the problems of estimation in statistics are those of estimating 
parameters of frequency functions. For example, the physicist interested 
in studying cosmic radiation would be interested in estimating the pa- 
rameter 6 in the frequency function (3) because this parameter determines 
the rate of the radiation. By taking a number of observations on his 
variable x, he could use the resulting data to estimate the value of 0. 



NATURE 'OF STATrSTTCAr^'METM^ 

Two kinds of estimates of parameters are in common use. One is 
called a point estimate and the other is called an interval estimate. A 
point estimate is the familiar kind of estimate; that is, it is a number 
obtained from computations on the observed values of the random 
variable which serves as an approximation to the parameter. For example, 
the observed proportion of defective parts in 50 consecutive parts turned 
out by a machine is a point estimate of the true proportion p for that 
machine. An interval estimate is an interval determined by two numbers 
obtained from computations on the observed values of the random 
variable that is expected to contain the true value of the parameter in its 
interior. Interval estimates are considered briefly in Chapter 6 and more 
fully in Chapter 9; therefore, only point estimates are discussed here. 

In order to know hoW to use several observations of a random variable 
in an intelligent manner for constructing a point estimate of a parameter 
of the frequency function of the random variable, it is desirable tp have 
some general principle to follow, just as it was in testing hypotheses. 
The principle, or method, should be such that the estimates obtained by 
using the method will possess desirable properties. For example, if two 
different methods are applied to the same sets of observations and if one 
method produces estimates that are consistently closer to the value of the 
parameter being estimated than those of the other method, then the first 
method would obviously be preferred. Properties of good point estimates 
are considered in some detail in Chapter 9; here it suffices to describe a 
method that is usually preferred by most statisticians and to state that the 
method possesses many desirable properties. This method of estirhation, 
known as the maximum likelihood method, is used in the following 
chapters whenever the problem arises of finding a point estimate of a 
parameter of a frequency function. It is defined after some necessary nota- 
tion has been introduced. 

Let f(x ; d) be the frequency function of the random variable x, Mere 6 
is the parameter to be estimated. Suppose that n observations arie to be 
made of the variable x. Let x^, denote the n random variables 

corresponding to these w observations. Then the function given by 

(10) I(a;i, --- , a;„; e) =/(*i;e)/(ii:2; 6) •••/(»„; 6) . 

defines a function of the random variables x^, , x^ and the param- 

eter 6 which is known as the likelihood function. 

For the purpose of interpreting this function, suppose that the observa- 
tional values are obtained from n independent trials of an experiment for 
which /(:r; 6) is the frequency function of a discrete random variable x. 
Then, for any particular set of observational values, because of (24), 
Chapter 2, the likelihood function gives the probability of obtainihg that 



58 


INTRODUCTION TO MATHEMATICAL STATISTICS 


set of values, including their order of occurrence. If, however, a; is a 
continuous variable, the likelihood function gives the probability density 
at the sample point * ’ ’ » ^n)> where the sample space is thought of 

as being n dimensional. 

Now, for a given set of observational values, an estimate of 6 is merely 
a number obtained from calculations made on the observational values; 
however, from the point of view of procedure, an estimate is a function 
of the observational values. For example, the function 

(a:i + *2 H H 

is a typical estimate. It is customary for some statisticians to use the word 
estimator for the function and the word estimate for the value of the func- 
tion after the observational values have been inserted. Thus 

+ ^2 + * ‘ * + ^n)l^ 

would be called an estimator of 6, whereas its numerical value in any given 
problem would be called an estimate of 6. Other statisticians, however, 
use the word estimate both for the function and its numerical value. 

Using the notation and terminology of the preceding paragraphs, the 
method of maximum likelihood estimation may be defined in the following 
manner: 

(11) Definition: A maximum likelihood estimator 6 of the parameter 
6 in the frequency function f{x\d) is an estimator that maximizes the like- 
lihood function ^ x^; B) as a function of 6. 

If the x.^ are treated as fixed, the likelihood function becomes a function 
of 6 only, say L{d); consequently, the problem of finding a maximum 
likelihood estimator is the problem of finding the value of 0 that maxi- 
mizes L{ 6 ). This maximizing value of 6 is, of course, a function of the 
x^ that have been treated as fixed; hence, if one is discussing a maximum 
likelihood estimator it is necessary to write 6 = 6 {x^, ^ 2 ^* ‘ ^ show 
that the estimator is a function of the observational values rather than 
just a number. 

Maximum likelihood estimators can usually be obtained by calculus 
methods because the relative maximum of the likelihood function ob- 
tained by differentiating L{x^, ••• ,x^i 6 ) with respect to 6 and setting 
the derivative equal to zero is usually an absolute maximum. 

As an illustration of the calculus technique for finding maximum 
likelihood estimates, consider the problem of estimating the parameter 
6 in the frequency function (3) if five observations on x yielded the values 
Xi = .9, X 2 = 1.7, x^ = .4, X 4 = .3, and x^ = 2.4. 



NATURE OF STATISTICXL MfiTHOBiS 

■'i ■ ■ 

■^:Whe maximum likelihood estimator is first obtained as fbll|j>ws. By 
means of (3) and (10), the likelihood function is 

L = . . . 0e^n 

n . .. 

-e S «,• ' 

== O^e H . . 


Then, differentiating with respect to 0, and collecting terms, 

Setting dLjdO = 0 and solving for 0, it will be observed that either 0 = 0 
or the quantity in brackets is 0. Since there is no frequency function when 
0 = 0, the only nontrivial solution of this equation is 


( 12 ) 



This is the desired maximum likelihood estimator of 0. It will be observed 
that this estimator is merely the reciprocal of the arithmetic mean of the 

In order to find the maximum likelihood estimate for the given observa- 
tions, it is merely necessary to choose n = 5 and insert the five given 
observational values in (12). Computations yield the estimate 6 = .“S'S. 

As a second illustration, let /7 be the probability that an event A will 
occur when an experiment is performed and let the experiment be repeated 
until A does occur. Further, let x denote the number of experiments that 
are required before A occurs. Here the frequency function of a; is 


(13) 


fix) = (1 --pT-^p 


because x — \ successive failures, followed by a success, for the event 
A must occur if the event A is to occur the first time on experiment number 

X, The problem is to find the maximum likelihood estimator of Now 

the function given by (13) is also the likelihood function; therefore its 
maximum with respect to the parameter p must be found. It is convenient 
here to take logarithms and then maximize the log /(a;) by calculus 
methods. The value of that maximizes log/(x) will be the same as the 
value that maximizes /(a;). Thus 

log/(*) = (* - 1) log (1 - />) + log/> 

Hence ; 

d log /(a;) _ a: - 1 ^ 1 
dp 1 — p p 



60 


INTRODUCTION TO MATHEMATICAL STATISTICS 


If this derivative is set equal to 0, it will be found that the value of p which 
satisfies the resulting equation is given by 

. 1 
P=^- 

X 

Thus, if A were the event of getting a 1 to turn up in rolling a die, the 
estimate of p, whose value is | here, would be the reciprocal of the number 
of rolls needed before a 1 appeared. 

As a slight generalization of this problem, suppose a set of n such 
experiments is carried out. Let denote the number of trials 

of the experiment required before A occurs in each group of experiments. 
Each of the possesses the frequency function given in (13); therefore 
the likelihood function now is 


i=l 

= (1 -- pT^-^p • (1 - pT^-^^p •••(!- pf^-^p 
= (1 - 


As before, the maximum is easier to find if one first takes logarithms. 
Thus 


Hence 


log L = (2^^ - n) log {\ -p) + n log p 


9 log L _ n ^ n 

dp 1- p p 


The solution of the equation obtained by setting this derivative equal to 
zero is given by 

n 


The similarity of this result with that given in (12) should not tempt one 
to generalize about the nature of maximuni likelihood estimates. 

Although the discussion of estimation has been limited to that of esti- 
mating a parameter of a frequency function, there are methods available 
for estimating various properties of a frequency function, such as its 
maximum value. In addition, there are methods for estimating the fre- 
quency function itself. In doing so it is customary to estimate the distribu- 
tion function either by a broken line curve, which corresponds to a point 
estimate, or by a pair of such curves that are expected to contain the true 
distribution function curve between them, which corresponds to an 
interval estimate. 



NATURE OF STAmWAr^^M ^ gj 

Problems of estimation are often more delicate than thos^ of ’testing 
hypotheses because there is usually more danger of being misled when 
estimating a parameter of an incorrect model than when testing a hypoth- 
esis about it. For example, an experiment may be designed to compare 
two groups of animals, one treated and the other untreated. If the two 
groups do differ and one tests the hypothesis that they do not, then in a 
well-designed experiment one is likely to reject that hypothesis even though 
an incorrect model may have been chosen to represent the beha-vior of 
the animals. Estimates of a parameter for the two groups, however, 
might be very misleading in describing the behavior of the animals if the 
model chosen to do so were unrealistic. 


REFERENCES' : / 

More e?{tensive discussions of the ideas presented in this chapter may be found in 
several of the books listed in the references for Chapter 1 . 


EXERaSES 

1. Given the frequency function /(cr; 0) := 1/6, 0 ^ 6, and 0 elsewhere, 

if you are testing the hypothesis Hq: B ~ 1 against Hi '. = 2 by means of a 
single observed value of a;, (a) what would the sizes of the type I and type II 
errors be if you chose the interval .5 ;< a; as the critical region ? (b) What would 
the sizes of these errors be if you chose the interval 1 ^ 1.5 as the critical 

region? 

2. Suppose you wish to test a hypothesis /Tq against an alternative JF/i by 
tossing a coin once and agreeing to accept Hq if a head shows and to accept Hi 
otherwise, (a) What are the values of a and ^ for this test? (6) What; would 
the values of a and p be if you tossed the coin twice and agreed to accept Hq 
if 2 heads showed and to accept Hi otherwise? 

3. Given that x has the frequency function /(a?; Q) == ^ ~ 1 ^ ^ ^ + 1, 

and 0 elsewhere, if “4 and Hi'. B = $ and the critical region is to be of 
size a = .25 and to consist of a single interval, show by a sketch which critical 
region you would choose and determine what the value of would be for that 
choice, assuming that the test is to be based on a single observed value pf a?. 

4. What critical region ^vith a = .5 would you cHoose in problem i' if you 
wanted a critical region of this size that minimizes 7 

5. Given /(x; 6) = (1 + 6)x^, 0 > 0, 0 ^ x and 0 elsewhere, if the 
hypothesis = 1 is to be tested by taking a single observation oh i and 
using the interval x < .5 as the critical region, (a) calculate the value of a and 
(b) calculate the probability of determining that JFfo is false if the true v of § 



62 


INTRODUCTION TO MATHEMATICAL STATISTICS 


6. Let a; be a random variable whose frequency function values under Hq and 
Hi are as follows. 


X 

1 

2 

3 

4 

5 

6 

7 


.01 

.02 

.03 

.05 

.05 

.07 

.77 


.03 

.09 

.10 

.10 

.20 

.18 

.30 


{a) List all critical regions whose size is equal to .10 

ib) List all critical regions whose size does not exceed .10 

(c) Among the critical regions in {a), which has the smallest value of /? ? 

(d) Are there any in {b) which have a still smaller value of 

7. A box is known to contain either 3 red and 7 black balls or 7 red and 3 
black balls. Three balls are to be drawn from the box, and on the basis of their 
colors a decision relating to the contents of the box will be made. If Hq denotes 
the hypothesis that there are 3 red and 7 black balls and if Hq will be accepted 
unless 3 red balls are obtained, what are the values of a and /? here? 

8. A bag is known to contain 9 black balls and either 1 or 2 white balls. To 
test the hypothesis that there is only 1 white ball, balls are drawn until a white 
one appears. Let x equal the number of balls drawn and find f(x) under both 
hypotheses. Choose a good critical region for the test and find its value of a 
and 

9. Find the power function for problem 5 and graph it. 

10. If the region x > 4.5 is used as the critical region in problem 3, find the 
power function for 0 > 4 and sketch it. Using your result, determine what 
alternative values of 0 > 4 are such that ^ < .25. 

11. If p denotes the probability that an event A will occur in a single trial of 
an experiment, then /(a:; p) = (1 — py-^p is the frequency function for x, the 
number of trials needed before A occurs. Find the power function for testing 
H^'.p = J if the critical region consists of the points x = 1, 2, 3. Criticize this 
choice of critical region. 

12. Given the frequency function f{x\ 0) — j find the maximum 

likelihood estimator of 0 based on a sample of size n, 

13. Given the frequency function/(a:; 0) = e~^d^lx], where x can assume only 
non-negative integer values, and given the six observed values 6, 11, 4, 8, 7, and 
6, find the maximum likelihood estimate for 0. 

14. Find the maximum likelihood estimator for 6 based on n observations for 
the frequency function f(x; 0) = (1 + 6)x^, 0>— l,0<ic<l. 



15. Given the frequency function f(x; 6) = e find the maximum 

likelihood estimator for 0. 

16. In problem 15 treat 6^ as the parameter to be estimated and write the 



frequency function as f(x; X) = e ^^/V27rA, where A = 0^, Now find the maxi- 
mum likelihood estimator for A and compare with the result for problem 15. 



NATURE OF STATIStrCAC MCTHieDS ' ' 63 

ii" 

17. A box contains 10 balls, of which the proportion p are white. Let x equal 
the number of white balls obtained in drawing 2 balls from the box. Find the 
frequency function f(x;p) and then find the value of /? that will maximize f(x; p). 
Here the only values that p can assume are/) = //lO, / = 0, 1, * • • , 10. 

18. For the frequency function /(a;; 0) = 1/0, 0 < 0, and 0 elsewhere, 

show that for n observations on this variable, the estimate 6 that maximizes the 
likelihood (I jOT niust be ^ = max {ajj, x^, * * * , that is, the largest of the n 
observations. . ■ ii 

19. Show that the likHihbod function L(0) will be maximized when Ipg L(0) is 
maximized if standard calculus methods may be used to obtain the maximum. 



CHAPTER 4 


Empirical Frequency Distributions of 
One Variable 


4.]f Introduction 

In this chapter and the next, statistical methods that involve only one 
random variable will be studied. This chapter is concerned with methods 
for extracting information from data that will be useful in helping to 
determine a model for the random variable giving rise to the data. For 
example, if x represents the range error in a radar tracking experiment and 
if 200 trackings have been made, it is important to know how to use the 
experimental data to help determine what frequency function should be 
selected for x. The emphasis in this chapter is on the practical mechanics 
of handling data, whereas the next chapter is concerned with the actual 
selection of the model. After the material of this chapter and the next 
have been completed, the problems of statistical inference discussed in 
the preceding chapter can begin to be solved. 

It is convenient in discussing statistical methods to call the totality of 
possible experimental outcomes the population of such outcomes. Then 
a set of data obtained from performing the experiment a number of times 
is called a sample from the population. In this language statistical infer- 
ence consists in drawing conclusions about a population by means of a 
sample extracted from the population. This chapter, therefore, is con- 
cerned with methods for extracting information from samples for use in 
studying the populations from which the samples were drawn. 

The type of information that should be extracted from a set of data 
depends upon the nature of the data and upon the model that is likely 
to be selected. In some problems one knows from theoretical considera- 
tions or from experience with similar problems what model should be 
used. For example, the frequency function that was introduced in (1), 
Chapter 3, is such a model. The frequency function given in (3), Chapter 
3, is another. All that is really needed from experimental data for such 

64 



EMPIRICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE ' ' 65 

models is information that will give one good estimates of the parameters 
involved. In other problems neither theory nor experience is available to 
assist one in selecting a model. Then it is necessary to use experimental 
data to decide on a reasonable type of model before one can test hypo- 
theses about it or estimate its parameters. Fortunately, in testing certain 
hypotheses about frequency functions it is not necessary to know the 
frequency function too precisely, and therefore the information concerning 
it that can be obtained from moderate amounts of data may suifflce to 
describe it adequately for testing purposes. f 

In considering the nature of the data it is particularly important to 
distinguish between those sets of data for which the order in which the 
observations were obtained yields useful information and those sets for 
which it does not. For example, if one were interested in studying weather 
phenomena or the stock market from day to day, the order would be very 
important. Industrial experience indicates that the information obtained 
from considering the order in which articles are manufactured ii indis- 
pensable for efficient production. However, if one were interested in 
studying certain characteristics of college students and had selected a set 
of students by choosing every twentieth name in a college directory, he 
would hardly expect the order in which the names were obtained to be of 
any value in the study. Methods for dealing with data for which drder is 
important are considered in later chapters. In this chapter the emphasis 
is on techniques that do* not use order information. The material in these 
later chapters will enable the investigator to decide whether he is justified 
in assuming that he may ignore the order information present in his data. 


4.2 Classification of Data 

Suppose one is given the weights of 200 college men and he wishes to 
use them to study the weight distribution of such men. Now it 'jis very 
difficult to look at 200 measurements and obtain any reasonably accurate 
idea of how those measurements are distributed. For the purpose of 
obtaining a better idea of the distribution of weights it is therefore con- 
venient to condefise the data somewhat by classifying the measurements 
into groups. It will then be possible to graph the modified distribution 
and learn more about how weights are distributed. This condensation 
will also be useful for simplifying the computations of various averages that 
need to be evaluated, particularly if fast computing facilities are not avail- 
able, These averages will supply additional information about the distri- 
bution. Thus the purpose of classifying data is to assist in the extraction of 
certain kinds of useful information concernihg the underlying distribution. 



66 INTRODUCTION TO MATHEMATICAL STATISTICS 

If the data are for a discrete variable, there is usually no need for classifi- 
cation. Thus data on the number of petals on flowers of a given species 
or the number of yeast cells on a square of a hemacytometer are naturally 
classified. There is usually little difficulty in performing the classification 
when there appears to be a need for it. 

If the data are for a continuous type of variable such as length, weight, 
or time, they are recorded to a certain digit or decimal accuracy. For 
example, if the diameter of a steel rod is measured to the nearest thousandth 
of an inch, a diameter of .431 inch assumes that the measurement, if 
taken to more decimal places, would lie between .4305 and .4315 inch. 

In classifying data for a continuous variable experience indicates that 
for most data it is desirable to use 10 to 20 classes. With less than 10 
classes too much accuracy is lost, whereas with more than 20 classes the 
computations become unnecessarily tedious. In order to determine 
boundaries for the various class intervals, it is merely necessary to know 
the smallest and largest observations of the set. As an illustration, sup- 
pose that 200 steel rods were measured and it was found that the smallest 
and largest diameters were .431 and .503 inch, respectively. Since the 
range of values, which is .072 inch here, is to be divided into 10 to 20 equal 
intervals, the class interval should be chosen as some convenient number 
between .0036 and .0072. A class interval of .005 inch will evidently be 
convenient. Since the first class interval should contain the smallest 
measurement of the set, it must begin at least as low as .4305. Further- 
more, in order to avoid having measurements fall on the boundary of two 
adjacent class intervals, it is convenient to choose class boundaries to i 
a unit beyond the accuracy of the measurements. Thus in this problem it 
would be convenient to choose the first class interval as .4305-.4355. 
The remaining class boundaries are then determined by merely adding the 
class interval .005 repeatedly until the largest measurement is enclosed in 
the final interval. If .4305~.4355 is chosen as the first class interval, there 
will be 15 class intervals and the last class interval will be .5005-.5055. 
When the class boundaries have been determined, it is a simple matter 
to list each measurement of the set in its proper class interval by merely 
recording a short vertical bar to represent it. When the number of bars 
corresponding to each class interval has been recorded, the data are said 
to have been classified into a frequency table. It is assumed in such a 
classification that all measurements in a given class interval, say the /th 
interval, have the value at the midpoint of the interval. This value is 
called the class mark and is denoted by Thus = .433 and x^^ = .503 
in the example just considered. The number of measurements found in 
the /th class interval is denoted by/^, and the total number of measure- 
ments is denoted by w. Table 1 illustrates the tabulation and resulting 
frequency table for the set of steel rods mentioned previously. 



EMPIRICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 67 

It is a common practice for many applied statisticians to indicate class 
intervals in a slightly different form from that suggested abov^V They 
record not actual class interval boundaries but rather noncontiguous 
boundaries. Thus they would indicate the first three class intervals by 
.431~.435, .436~.440, and .441~.445. When intervar boundane^^ 
indicated, the true boundaries are ordinarily halfway between the upper 


Class boundaries 

Frequencies 

Class marks: x 

Frequencies: / 

0.4305-0.4355 

11 

0.433 

2 

.4355- .4405 

m 

.438 

5 

.4405- .4455 

mil 

.443 

7 

.4455- .4505 

mm III 

.448 

13 

.4505- .4555 

mm mini 

.453 

t9 

4555- .4605 

mm mm mil 

.458 

27 

.4605- .4655 

mm mm mini 

.463 

29 

.4655- .4705 

mmmmm 

.468 

25 

.4705- .4755 

mmmmin 

.473 

23 

.4755- .4805 

m m //// 

.478 

14 

.4805- .4855 

mmm 

.483 

15 

.4855- .4905 

mini 

.488 

9 

.4905- .4955 

mi 

.493 

6 

.4955- .5005 

nil 

.498 

4 

.5005- .5055 

n 

.503 

2 


and lower recorded boundaries of adjacent intervals. Another common 
method of recording class intervals is to employ common boundaries but 
to agree that an interval includes measurements up to but not including 
the upper boundary. Then the first three class intervals above would be 
indicated by .43i-.436, .436-441, and .441-446. A measurement that 
falls on a boundary is placed in the higher of the two intervals. ■ If one 
knows the accuracy of measurement of the variable, there is little difficulty 
in determining the true class boundaries and class marks for these two 
methods of classification. It is important to use the exact class marks; 
otherwise a systematic error will be introduced in many of the computations 
to follow. 


4.3 Graphical Representation of Empirical Distributibiis 

A rough idea of how the values of a random variable are distributed 
can be obtained from inspecting its histogram. The histogram for the 
data of Table 1 for absolute frequencies is given in Fig. 1. It should be 



68 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 1. Distribution of the diameters of 200 steel rods. 



Fig. 2. Distribution of 302,000 marriages classified according to the age 
of the bride-groom. Frequencies are in units of 1000. 



Fig. 3. Distribution of 727 deaths from scarlet fever classified according to age. 



EMPIRICAL FREQUENCY DISTRIBUTIOHS OF DM VARtABLE^^^ i; 69 

noted that the class marks are at the midpoints of the bases of the rec- 
tangles making up the histogram. If preferred, the histogram may be 
drawn to show the class boundaries rather than the class marks. ;; 

Fortunately, many important frequency distributions to be found in 
nature and industry are of a relatively simple form. They usually range 
from a bell-shaped distribution, like that in Fig. 1, to something resembling 
the right half of a bell-shaped distribution. A distribution of the latter 
type is said to be skewed, skewness meaning lack of symmetry with respect 
to a vertical axis. It will he found, for example, that the following variables 
have frequency distributions that possess such forms in approximately 
increasing degrees of skewness: stature; various industrial measure- 
ments; weight; age at marriage; mortality age for certain diseases; 
and wealth. Figures 1, 2, and 3 represent three typical distributions with 
increasing degrees of skewness. 


4.4 ArithmeticarRepresen^ of Empirical Distributions 

As explained earlier, the principal reason for classifying data and 
drawing the histogram of the resulting frequency table is to determine the 
nature of the distribution. Some of the theory that is developed in later 
chapters requires that the distribution be one that possesses a' graph 
similar to tha^^ in Fig. 1 ; consequently, it is necessary to know 

whether one has this type of distribution before attempting to! apply 
such theories to it. ■ 

Although a histogram yields a considerable amount of general informa- 
tion concerning the distribution of a set of sample measurements, more 
precise and useful information for studying a distribution can be obtained 
from an arithmetical description of the distribution. For example, if 
the histogram of weights for a sample of 200 men from one college were 
available for comparison with the histogram of a similar sample from 
another college, it might be difficult to state, except in very general terms, 
how the two distributions differ. Rather than compare the two 'weight 
distributions in their entirety, it might suffice to compare the average 
weights and the variation in weights of the two groups. 

The nature of a statistical problem largely determines whether a few 
simple arithmetical properties of the distribution will be enough to describe 
it satisfactorily. Most of the problems that are encountered in this book 
are the type that requires only a few simple properties of the distribution 
for its solution. For sim^^ frequency distributions, such as those whose 
graphs are given in Figs. 1, 2, and 3, this description is accomplished 
satisfactorily by means of the Tow-order of the distribution, 



70 


INTRODUCTION TO MATHEMATICAL STATISTICS 


which are defined in (1). In many problems the statistician is concerned 
only with the first and second moments. In a few problems he uses the 
first four moments, but seldom does he use more than fourv One reason 
for this is that the higher moments are so unstable in repeated sampling 
experiments that little additional reliable information can be obtained 
from them. 

For data that have been classified let be the class mark for the zth 
class interval, the observed absolute frequency for the zth interval, h 
the number of intervals, and n the sum of the absolute frequencies. With 
this notation, empirical moments are defined as follows : 

(1) Definition: The kth moment about the origin of an empirical fre- 
quency distribution is given by 

mic' = - 2 *//» 
n i=i 

If the data have not been classified, will represent the value of the zth 
observation, the f will all be equal to 1, and h will be equal to n. The 
prime placed on m^^ is to distinguish this ^th moment from another 
moment to be defined later. 

Physics and calculus students are usually familiar with moments as 
they pertain to masses f located on the x axis at distances x^ from the 
origin. For example, the moment of inertia is essentially the second 
moment. Statistical interpretations of the low-order moments are given 
in the next two sections. 


4.4.1 The First Moment as a Measure of Location 

The first moment about the origin, is called the mean and is usually 
denoted by hence 

(2) * = - i 

n i=i 

For unclassified data x reduces to the familiar formula for the average 
of a set of numbers. Formula (2) is sometimes spoken of as the formula 
for the weighted mean; however, it is merely a variation of the familiar 
form adapted to classified data. Geometrically, the mean represents the 
point on the x axis where a sheet of metal in the shape of the histogram 
would balance on a knife edge. For a histogram like that of Fig. 1 it is 
clear that x defines a measure of location, that is, a value at which the data 
tend to center. The mean is ordinarily meant when the word average is 



EMPIRICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE tl 

used. For example, the statement that the average weight of a group of 
people is 140 pounds implies that this is their mean weight. 

If the and/^ are not large, the value of x is easily computed jFrom its 
definition, particularly if a calculating machine is available. Otherwise 
considerable time is saved for frequency tables having equal class iiitervais 
by using a short method based on introducing a new variable w, which takes 
on only small integral values and which is defined by 
, ’ , , ii " 

(3) Xi = 

Here c is the class interval and Xq is a conveniently chosen class mark. 
The computations are somewhat easier if Xq is chosen as a class mark 
near the middle of the distribution. W this expression is substituted 
fbra;/in(2), 

* = - 2 + ^o)fi i 

ni=i . 

■I Zfc ^ h 

= - 2 + ■■ 2 *o/i 

it i = l n i -1 ,, 

Since c and % are constants with respect to these summations, they may 
be factored out and placed in front of the summation signs; henqe 

x = c- X Uji + * 0 - 2/i 
n i=i 

From (2) it is clear that the coefficient of c is w, and from the definition of 
n the coefficient of is 1; therefore 

(4) X ^ CU + Xq 

Since the computations needed to find m are relatively easy, the value of 
X can be obtained quite easily without the aid of a calculating iriachine. 
This short method is illustrated in Table 2. The data for this frequency 
distribution are taken from 1000 telephone conversations in seconds, 
recorded to th^ nearest second. Here Xq was chosen as 449.5 because this 
choice gives rise to smaller products in the uf column than other choices, 
although 549.5 is nearly as good. When (4) is applied to Table 2, 

. , ; ' -ji ■■■■ 

« = 10o/—\ + 449.5 = 475.2 
\1000/ 

. ' -li-- -■ ■•■■■■ 

For certain common types of distributions, the mean is superior to 
other ordinary measures of location, some of which are considered; briefly 



72 


INTRODUCTION TO MATHEMATICAL STATISTICS 


later. This superiority rests largely on the fact that in repeated sampling 
experiments from such distributions the mean usually tends to be more 
stable than these other measures of location. For example, suppose one 
took a sample of five trees from a forest and calculated their mean height. 
Instead of the mean, one could have chosen, say, the middle height of the 
five as the measure of location. Now, if one repeated this experiment a 
large number of times, he would usually find that the set of means would 
tend to be more closely clustered than the set of middle measurements. 


Table 2 


X 

/ 

u 


49.5 

6 

-4 

-24 

149.5 

28 

-3 

-84 

249.5 

88 

-2 

-176 

349.5 

180 

-1 

-180 

449.5 

247 

0 


549.5 

260 

1 

260 

649.5 

133 

2 

266 

749.5 

42 

3 

126 

849.5 

11 

4 

44 

949.5 

5 

5 

25 

Totals 

1000 


257 


This property of greater stability is particularly important in later work 
when ^ precise estimate of a population mean is desired. It should be 
clearly understood that the mean possesses these advantages only for 
certain types of distributions of particular importance which are con- 
sidered in later chapters. There are other well-known distributions for 
which the mean is a very poor measure of location. 


4.4.2 The Second Moment as a Measure of Variation 

The concept of variation is of paramount importance in statistics. 
Statistical methods have often been called methods for studying variation. 
The problem of measuring variation occurs repeatedly in the various 
sciences and in certain branches of industry. For example, in order to 
detect any lack of uniformity in the quality of a manufactured product, it 
is first necessary to know the variability of the product. This may be 
illustrated in the following manner. Suppose a purchaser of wire will not 
tolerate wire that does not possess a tensile strength of at least 50 pounds 
and that he is considering buying it from one or the other of two firms. 
If equal samples taken from the products of these two firms gave empirical 



EMPIRICAL FREQUENCY DISTRIBUTI5W W"5M Tf 



Fig. 4. Hypothetical distribution of tensile strength. 


frequency distributions like those shown in Fig. 4, it is clear that tlie prod- 
uct of only one of the firms would satisfy the purchaser’s requiireinent. 
Since the mean tensile strength was 100 pounds in each sample, the pur- 
chaser would have had no basis for making a decision if the variation in 
tensile strength had been ignored. 

It is customary to assume that variation means variation of tjie data 
about a measure of location. Since the mean is being used as the tpeasure 
of location here, it is necessary to introduce moments about the mean in 
order to obtain a measure of variation from moments. Empirical mo- 
ments about the mean ate defined as follows: 

(5) Definition : The kih moment about the mean of an empirical frequency 
distribution is given by 

= - 2 

n i=i 

Now it will be shown that the second moment about the mean, can be 
considered as a measure of variation. Since it is often convenient to have 
a measure of variation in the same units of measurement as for the data, 
V/Mg is usually selected instead. This quantity is called the 
deviation and is denoted by s; hence 

( 6 ) = J- - ^ffi 

^ ni=i 

The second moment about the mean, s\ which is more convenient than 
the standard deviation as a measure of variation in certain situations, is 
called the variance. Some authors define these two quantities with n 
replaced by n — 1. Their definitions have certain advantages fqr later 
work but seem quite unnatural here. This matter is* considered in Chapter 
9. 

If one considers the computation of 5- for two distributions of differing 
spread, such as those whose histograms are given in Fig. 4, it should be 



74 


INTRODUCTION TO MATHEMATICAL STATISTICS 


clear that s does measure relative variation or spread for these distribu- 
tions. The distribution with the large tails will have a relatively larger 
value of s because the large deviations when squared and multi- 

plied by their relatively large frequencies will contribute heavily to 
the value of the sum and will more than compensate for the larger fre- 
quencies for small deviations in the concentrated distribution. The 
interpretation of the standard deviation as a measure of absolute variation 
is presented a few paragraphs later. At present it is merely a number in 
the same units as x which seems to measure the relative extent to which 
data are concentrated about the mean and which becomes larger as the 
data become more dispersed. 

The calculation of the standard deviation from its definition (6) becomes 
inaccurate unless an accurate value of x is used, and then the computa- 
tions usually become tedious. The change of variable introduced for 
computing the mean is also useful for obtaining a short method of com- 
puting the standard deviation for frequency tables having equal class 
intervals. From (3) and (4) it follows that 

a:,. - ^ = c(w,. - u) 

Consequently, 

- 2 (*< - ^ffi = - 2 - »ffi 

n n 

= ^ 2 + U^)fi 

\ w n n / 



n 

The short method for computing the standard deviation is therefore given 



Hereafter, as in this derivation, the indicated range of summation will be 
omitted from the summation sign whenever the range is obvious. 

For data that have not been classified, it is assumed in (5) that x^ 
represents the ith observation, that all the^^ are equal to 1, and that h equals 
«. The application to this case of the algebraic manipulations used to 
obtain (7) from (6) will yield the formula 




EMPIRICAL FREQUENCY 75 

This form is often more convenient than (6) for unclassified data, particu- 
larly when the contain at most two digits each. 

Table 3 illustrates the technique for computing s for the data of TTable 2. 




Table 3 



X 

/ 

u 

4 


49.5 

6 

-4 

-24 

96 

149.5 

28 

-3 

-84 

252 

249.5 

88 

~2 

V176 

352 

349.5 

180 

-1 

-180 

180 

449.5 

247 

0 



549.5 

260 

1 

260 

260 

649.5 

133 

2 

266 

532 

749.5 

42 

3 

126 

378 

849.5 

11 

4 

44 

176 

949.5 

5 

5 

25 

125 

Totals 

1000 


257 

2351 


When (7) is applied to Table 3, 

s = 100 7 — - (.257)* = 151 
^ 1000 

correct to the nearest integer. 

In order to interpret the standard deviation as a measure of variation, 
it is necessary to anticipate certain results of later work. For a set of 
data that has been obtained by sampling a particular type of population 
called a normal population it will be shown that the interval (x -- s’^x + s) 
will usually include about 68 per cent of the observations and that the 
interval (x — 2s, x + 2s) will usually include about 95 per cent of the 
observations, provided that n is large. A sketch of a particular normal 
distribution is shown in Fig. 5, Chapter 5. 

As an illustrative example of this property, consider the data for which 
the standard deviation was just computed. Previous calculations gave 
X = 475 and s = 151, correct to the nearest integer; consequently, the 
foregoing two intervals are (324, 626) and (173, 777), respectively. The 
number of observations lying within these intervals may be found approxi- 
mately by interpolating as though the observations in a given interval 
were dispersed uniformly throughout the interval. This assumption 
implies that on the histogram any fractional part of a class interval will 
include the same fractional part of the frequencies in that interval. For 
ease of interpolation, the histogram for this frequency distribution is 
shown in Fig. 5. If interpolation is carried to the nearest unit, it will be 
found that the interval (324, 626) will include 136 -f 247 + 2^0 + 35 



76 INTRODUCTION TO MATHEMATICAL STATISTICS 



ft ft 

173 324 626 777 


Fig. 5. Histogram for the distribution of 1000 telephone conversations. 

measurements, which is 67.8 per cent of them. The interval (173, 777) 
excludes 6 + 21+9 + 11 + 5 measurements, which is 5.2 per cent. 
For a histogram as irregular as this, these results are unusually close to 
the theoretical percentages. However, even for histograms possessing a 
considerable lack of symmetry, the actual percentages are often surpris- 
ingly close to the theoretical percentages, primarily because the large 
percentage of measurements in the short tail included by such an interval 
is compensated to a considerable extent by the small percentage of 
measurements in the long tail which are included. 

For certain common types of data the standard deviation is superior 
to other common measures of variation, some of which are considered 
briefly later. The superiority rests partly on its greater stability in re- 
peated sampling experiments and partly on its convenience for developing 
statistical theory. The situation with respect to other measures of varia- 
tion is very much like that of the mean with respect to other measures of 
location. 


4.4.3 Higher Moments 

The two preceding sections were designed to give a statistical inter- 
pretation or meaning to the first two moments. It is more difficult to 
give satisfactory statistical meanings to the moments of higher order. 
Any general interpretation is likely to fail for even fairly reasonable 
kinds of distributions. 



EMPIRICAL OBTRlBUtlDNS DF OW VARmtfi;| ' “ 77 

The principal use of empirical moments beyond the second has been 
in fitting theoretical frequency distributions to empirical distrlbutiorfs, 
and this use has been restricted largely to the third and fourth moments 
about the mean. In such fitting problems it is Customary to calculate the 
quantities 


( 8 ) 


mq J 

and a. = 


— 



and then use the four quantities x, s, Og, and ^4 to describe the empirical 
distribution. The reason for using and rather than and is that 
the former are independent of the units of measurement and the latter 
are not. The quantity is often called a measure of skewness because 
its value is 0 for a symmetrical distribution and is likely to be a large 
positive number for a distribution with a large right tail such as that in 
Fig. 3. The value of may be zero, however, for a nonsymmetrical 
distribution so that care must be used in interpreting Ug as a measure of 
skewness. The quantity occasionally given an interpretation as a 
measure of the peakedness of the distribution, but this interpretatibn is 
rather vague and of questionable value. It should suffice to use the Hrst 
four moments as quantities which usually describe empirical distributions 
fairly well without necessarily giving these moments geometrical inter- 
pretations. 

In the next chapter, when theoretical frequency distributions are con- 
sidered, it will be found that the higher moments play an essentiai role in the 
theory. The reason for this is that it is often necessary to know the values 
of all the theoretical moments before a theoretical frequency function is 
completely determined. Thus moments beyond the second maf be very 
important theoretically in determining frequency distributions even 
though empirical moments are not used a great deal to describe empirical 
frequency distributions. 


4.4.4 Other Descriptive Measures 



Among the other common measures of location are the median, mode, 
and geometric mean. 

For a set of measurements arranged in order of magnitude the median 
is defined as the middle measurement, if there is one, otherwise as the 
interpolated middle value. Thus for the set of measurements 2 , 3, 3, 4, 
5, 5, 6 , 6 , 7, 7, 7, 9 the median is 5.5. For classified data the ifiedian is 
defined as the abscissa which divides the area of the histogram into two 
equal parts. Some workers prefer the median to the mean when the 



78 


INTRODUCTION TO MATHEMATICAL STATISTICS 


distribution is heavily skewed because they feel that it is more representa- 
tive of what a measure of location should be than the mean is under such 
circumstances. They might, for example, prefer the median when dis- 
cussing the notion of average wage of a community because a few very 
large incomes would produce a mean wage higher than the notion 
of average wage implies, whereas the median wage would not be so 
affected. 

The mode of a set of measurements is defined as the measurement with 
the maximum frequency, if there is one. For the set of measurements in 
the preceding paragraph, the mode is 7. If there is more than one measure- 
ment with the maximum frequency, no completely satisfactory definition 
exists. The mode is used occasionally in situations similar to those for 
which the median might be selected. Since the mode is of questionable 
value in descriptive statistics, it will not be considered further here. 

The geometric mean of a set of measurements, assuming that they are 
positive, is defined as V • • • Xj/k If the data are classified, x^ rep- 
resents the ith class mark; otherwise it represents the zth measurement, 
in which event all the equal 1 . It will be observed that the logarithm of 
the geometric mean is equal to the arithmetic mean of the logarithms. 
This measure is used principally in working with business index numbers, 
for which it possesses certain advantages. 

Among the more common measures of variation are the range and 
mean deviation. The range, which is the difference between the largest 
and smallest measurement in the set, is used as a measure of variation 
largely because of its ease of computation. It is often applied in certain 
industrial engineering work. It has two important disadvantages. First, 
its value usually increases with n because there is a better chance of 
obtaining extreme measurements if a large sample of data is taken than if 
a small sample is taken. It is possible, however, to make allowance for 
this growth and thus eliminate this disadvantage of the range. Second, 
the range is usually quite unstable in repeated sampling experiments of 
the same size when n is large; consequently, its use is ordinarily restricted 
to sets of data containing less than 10 observations each. Because of its 
importance in various fields, the range will be studied more fully in a 
later chapter. 

The mean deviation is defined as 2 | ■— ^ | fjn, where the absolute 

values, that is, the positive values of deviations, are employed. This 
measure of Variation is often used because it appears to be easier to cal- 
culate and understand than the standard deviation. It will be found, 
however, that the short method of calculating the standard deviation is 
about as fast as calculating the mean deviation, when n is large. 



EMPIRICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 79 

Consideration was given to these other measures of location and 
variation only because they appear quite often in certain fields ojf applica- 
tion and the student of statistical methods should be acquainted with 
them. However, for the present, moments will be selected as the preferred 
set of descriptive measures unless there are valid reasons for doihg other- 
wise. 

An interesting example of a theoretical distribution for which moments 
are a poor choice of descriptive measures is the distribution whose fre- 
quency function is 


(9) 


/(*) = 


1 

77'[l + (X — Bfl 


For this distribution, which is known as the Cauchy distribution, it turns 
out that the theoretical moments, which are defined in the next c 
are all infinite. It also turns out that the mean of a sample of w observa- 
tions is no better than a single observation for estimating the parameter 

0. The median here is a niuch better nieasure of location than the mean. 
This example illustrates the fact that there are no universal methods for 
solving all statistical problems. 


REFERENCES 

Lengthy treatments of moments and other descriptive measures for empirical distribu- 
tions may be obtained in either of the following two S 
Kendall, M. G., The Advanced Theory of Statistics, Vol. 1, Griffin and Co. 

Yule and Kendall, Introduction to the Theory of Sidtistics, Griffin and Co!' 


EXERCISES 

i; ■ 

1. Weights of 300 entering freshmen ranged from 98 to 226 pounds, correct 

to the nearest pound. Determine class boundaries and class marks for the first 
and last class intervals. ’ 

2. The thickness of 400 washers ranged from .421 to .563 inch. t)eterjmine 
class boundaries and class marks for the first and last class intervals. 

3. Given the following frequency table of the heights in centimeters of 1000 
students, draw its histogram, indicating the class marks. 

155-157 158-160 etc. 


/ 


26 53 89 146 188 181 125 92 60 22 ”4 1 1 



80 


INTRODUCTION TO MATHEMATICAL STATISTICS 


4. Given the following frequency table of the diameters in feet of 56 shrubs 
from a common species, (a) draw its histogram and {b) guess by merely inspecting 
the histogram the values of x and s. 

X \ 2 3 456789 10 11 12 

/17 11 16 84521 0 0 1 

5. For the data of problem 4 calculate x by {a) definition and {b) the short 
method. 

6. For the data of problem 4 calculate s by {a) definition and (Jb) the short 
method. 

7. Given the frequency function 

fix) = [5\lx \ (5 - a. = 0, 1, • • • , 5, 

multiply /(a?) by 243 and treat the resulting numbers as observed frequencies 
for the corresponding x values, (a) Calculate x by definition, (b) Calculate s 
by the short method. 

8. Given the frequency function fix) = a? = 1, 2, 3, * • • , multiply 

fix) by 1000 and treat the resulting numbers, after rounding off to the nearest 
integer, as observe frequencies, ia) Calculate x, ib) Calculate s. 

9. For the histogram of problem 4, using the results in problems 5 and 6, 
calculate the approximate percentages of the data that lie within the intervals 
X ± s and X db 2s, Explain why these percentages are fairly close to normal 
distribution percentages in spite of the obvious non-normality of this distribution. 

h 

10. Show that 2 i^i "" ^)fi = 

1=1 

1 1 . If stature of adult males may be assumed to possess a normal distribution, 
what would you guess the standard deviation of stature to be if you estimate a 
2-standard deviation interval about the mean through your knowledge of male 
stature? 

12. If the scores on a set of examination papers are changed by ia) adding 10 
points to all scores and ib) increasing all scores by 10 per cent, what effects will 
these changes have on the mean and standard deviation ? 

13. What would you judge a distribution to be like if the variable can assume 
only positive values and the mean and standard deviation are equal ? 

14. Show that formula (6) in the text is equivalent to the formula = 

niz - Wi' 

15. By expanding the binomial in formula (5) in the text and summing term by 
term, derive a formula for calculating the A;th moment about the mean in terms 
of the ^th and lower-order moments about the origin. 

16. Suppose only the 2 means x^ and x^ are available from 2 sets of observa- 
tions of sizes rti and made on the variable x. Show that the mean of the com- 
bined set X is given by x = itiiX^ + « 2 ^ 2 )/('^i + / 22 )* 

17. If the 2 standard deviations s^ and jg are also available in problem 16, 
show that the standard deviation of the combined set s can be obtained from 

= in^s^^ -h + n^) n^noix-i — x^)^lini 4- n^)^. 




EMPIRICAL FREQUENCY DISTRIBUTIONS OF QNP YARIABLE;; 81 

18. For the d^ta of problem 3, {a) calculate ^ and 5- by the short method, 
{b) calculate the approximate percentages of the data that lie in the, .intervals 
X ± s and x ± 2s and compare with normal distribution percentages, '(c) calcu- 
late the crude median and mode, and {d) estimate the range for the data. 

19. Given the following 4 mass points, calculate the mean and third; moment 
about the mean and explain what this example shows concerning the third 
moment about the mean as a measure of symmetry. A mass of 5 at a: = —4; 
a mass of 10 at> = — 1 ; a mass of 10 at x = 2; and a mass of 2 ^t aj = 5. 

20. Sketch a histogram for which you believe the value of s will be large, yet 

for which most of the distribution will be concgnteted^^^^^^^ so that 

the interval x ± 25 will include at least 99 per cent of the data. 



CHAPTER 5 


Theoretical Frequency Distributions of 
One Variable 


5.1 Introduction 

The purpose of this chapter is to introduce a few of the commonly 
used theoretical frequency distributions as models for empirical distri- 
butions. As pointed out and illustrated in 4.1, in some problems one 
knows from theoretical considerations what model should be used. In 
other problems one must rely on samples and experience to determine 
a satisfactory model. Ordinarily the sample is not large enough to deter- 
mine the population distribution with much precision; however, there 
is often enough information in the sample, together with information 
obtained from other sources, to suggest the general type of population 
distribution involved. 

In many problems it suffices to consider certain properties of a distribu- 
tion rather than to study the entire distribution. In particular, it often 
suffices to know the low-order moments of a distribution. This chapter, 
therefore, is concerned with theoretical moments as well as with theoretical 
frequency distributions. 


5.2 Discrete Variables 

Most of the discrete variables that occur in statistical experiments 
are the counting type. For example, the variable might be the number of 
accidents a car owner has per year or the number of insects surviving a 
spraying. Variables such as these assume only non-negative integral 
values. The discrete random variables that are considered here are 
variables of this type, that is, those that assume non-negative integral 
values only. 


8? 



THEORETICAL FREQtJENCT DISTRIBUTIONS OF ONE VARIABLE gf 

5.2.1 Moments 

Before considering particular theoretical frequency distributions for 
discrete variables, a brief discussion of theoretical moments is given 
because of the importance of theoretical moments in determining models 
and in deriving statistical theory. 

Theoretical moments for a discrete variable of the type being considered 
are defined as follows: 

(1) definition: The kth moment about the origin of a theoretical fre- 
quency distribution with frequency function f(x) is given by I 

a !=0 . 

If this definition is compared with that of an empirical frequency distribu- 
tion as given by (1), Chapter 4, it will be noted that the probability /(a;) 
takes the place of the observed frequency ratio ffn in that definition. 

The ki)x moment of a distribution is also commonly called the A:th 
moment of the random variable whose distribution is being studied. 
Thus one may speak of as being the kih moment of x or as the A:th 
monient of the distribution of a;. 

Since theoretical moments about the mean are used extensively, they 
also need to be defined. As before, it is assumed that the randon^. variable 
X can assume only non-negative integral values. 

(2) Definition: The kth moment about the mean of a theoretical fre- 
quency distribution with frequency function f{x) is given by 

>* = 2 (* 

£B = 0 „ ,, 

This definition is the theoretical analogue of the corresponding definition 
for empirical distributions as given by (5), Chapter 4. 

Since the first moment about the origin, which is the theoretical mean, 
and the square^ f moment about the mean, which is the 

theoretical standard deviation, are both used so often, they are given 
special symbols, namely // and a. Thus [x = and cr = V fx^. 

In evaluating ^2 if is usually more convenient to evaluate the first two 
moments about the origin and then calculate fx^ from them rather than 
evaluate fx^ directly from definition (2). This is accomplished by ex- 
panding the binomial in (2) for k = 2 in the following manner. ■ 

1 (* - 

aj = 0 ■ i.! ■ ■ ■ • 

= 2 - 2jtt 2 */’(=*) + / 2 /(*) 

03 = 0 33=0 03 = 0 



84 


INTRODUCTION TO MATHEMATICAL STATISTICS 


But from (1) this may be written 

f^2 ~ /^2 “h 

Combining terms, the desired formula is obtained, namely 

(3) 

5.2.2 Moment Generating Function 

Even though the direct computation of theoretical moments from 
definition (1) may be easy, it is convenient for later theory to be able to 
calculate such moments indirectly by another method. This method is 
introduced here and used throughout several chapters for proving theorems. 
It involves what is known as the moment generating function. As the 
name implies, the moment generating function is a function that generates 
moments. It is defined as follows: 

(4) Definition: The moment generating function of a random variable 
X with frequency function f{x) is given by 

x-0 

This series is a function of the parameter 6 only, but the subscript is 
placed on M{d) to show what variable is being considered. The param- 
eter 6 has no real meaning here; it is merely a mathematical device intro- 
duced to assist in the determination of moments. 

In order to see how M^(6) does produce moments, assume that f(x) 
is a frequency function for which the series in (4) converges. Now expand 
in a power series and sum term by term. Since the power series for e^ is 

it follows from (4) and (1) that 

(5) MM = 2: l + e* + ^ + ^ + -- - /(*) 

a!=o L 2! 3! J 

= i/(*) + e 2 + I + 

a> = 0 a; = 0 2! £c = 0 

02 0® 

= 1 + 0^/ ^ 2 ' + + • • • 

It will be observed that the coefficient of d^/kl in this expansion is the A:th 
moment about the origin; consequently, if the moment generating func- 
tion can be found for a variable x and can be expanded into a power series 



THEORETICAjL FREQUENCY DISTRIBUTION OF OM 

in 0, the moments of the variable can merely inspecting 

the expansion. If a particular moment is it may be more con- 

venient to evaluate it by computing the proper derivative of Mc(0) at 
0 = 0, since repeated differentiation of (5) will show that 


( 6 ) 




d^M 
dd^ >=o 


Applications of the preceding definitions are begun in the next section. 


5.2.3 Binoniial Distribution 

Consider an experiment of the repetitive type in which only the occur- 
rence or nonoccurrence of an event is recorde d . , Suppose the probability 
that the event will occur when the experiment is performed is Let 
q ^ I — p denote the probability that it will fail to occur. If th^ event 
occurs at a given trial of the experiment, it will be called a success^; other- 
wise a failure. Let n independent trials he made and denote by x the num- 
ber of successes obtained in the « trials. Then consider the problem of 
determining the probability of obtaining precisely x successes in in trials 
of the experiment. A formula for this probability would be needed, for 
example, if one knew that the probability of a marksman hitting a target 
is 1^0 and if one wished to calculate the probability of getting at least 
two hits in taking 20 shots at the target. 

For the purpose of deriving the desired formula, first determine the 
probability of obtaining x consecutive successes followed by n — x 
consecutive failures. These « events are independent; therefore, % (10), 
Chapter 2, this probability is 

X n~x 

The probability of obtaining precisely x successes and n — failures in 
some other order of occurrence is the same ^ a particular order 

because the j^’s and ^’s are merely rearranged to correspond to the other 
order. In order to solve the problem, it is therefore necessary to count 
the number of orders. 

The number of orders is the number of permutations possible with 7? 
letters of which x are alike (p’s) and the remaining tz — a; are alike (^’s). 
But by formula (18), Chapter 2, the number of such permutations is equal 
to 

n\ 

x\(n — cr)! 


( 7 ) 



80 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Now, by (4), Chapter 2, the probability that one or the other of a set of 
mutually exclusive events will occur is the sum of their separate proba- 
bilities; consequently it is necessary to add as many times as 

there are different orders in which the desired result can occur. Since (7) 
gives the number of such orders, the probability of obtaining x successes 
in some order is therefore given by multiplying by the quantity in 

(7) . The resulting probability, which is that of obtaining x successes in 
n independent trials of an experiment for which p is the probability of 
success in a single trial, defines what is known as the binomial or Bernoulli 
frequency function. Consequently, 

n ^ 

(8) Binomial Distribution: f{x) = ^ ^ 

x\ (n — x)l 

Bernoulli was one of the first mathematicians to develop probability 
theory for discrete variables; hence this distribution has been named after 
him. The more commonly used name of binomial distribution comes from 
the relationship of (8) to the following binomial expansion. 


(9) (q + pT = q^ + nq^-^p + 


-i„ n(n - 1) 2 


+ ■ ■ ■ + P 


= 2 


n\ 


x^o x] (n — a;)! 


px^n-x 


From (8) it is clear that (9) may be written 


(q + pr= 1 m 

x^O 

Thus the various terms in the binomial expansion of (q + pY give the 
probabilities of the various possible results in their natural order. 

The binomial frequency function is an example of a mathematical 
model that can be applied to many real-life problems involving a discrete 
variable. In any given application it is necessary to know or to estimate 
the values of the two parameters p and n before (8) can be used. 

5.2*3.1 Illustrations. As illustrations of the direct application of formula 
(8), first consider two impractical problems related to the rolling of a die. 
If a true die is rolled five times, what is the probability that precisely two 
of the rolls will show Ts? Here success consists in obtaining a 1; hence 
p h q ^ h and « = 5. When (8) is applied, the solution is 


/( 2 ) = 


5! /I 


2!3!\6 


= .16 


If the die is rolled five times, what is the probability of obtaining at 
most two Ts? To answer this question it is necessary to compute the 



THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE * 87 

probabilities of obtaining precisely no I’s, one 1, and two I’s. apply- 
ing (8), 


/( 0 ) = 

/(!) = 


5! 

0! 5!\6 
5! 


1\V5J 


.40 


1!4!W \6, 


llfi5T-.40 


Since these three possibilities are mutually exclusive events, it follows that 
P{x ^ 2} =/(0) -f /(I) +f(2) = .96 

As a somewhat more earthy problem, consider the one mentioned just 
before the derivation of (8), namely, that of calculating the probability 
of getting at least two hits on a target in taking 20 shots at it if the prob- 
ability of a hit for a single shot is ro- Here /? = ro and n = 20 ; hence 

P{*^2} = l-/(0)-/(l) 

— (f.r-»(f.;(f.r 

= 1 - ,122 - .270 

= .608 

The validity of using the binomial model in the last illustratidh is ri 
so obvious as it is in the first two illustrations. The derivation of the 
binoniial formula was based on independent trials with p constatit from 
trial to trial. If the same man takes repeated shots at the same target, it 
might be expected that his chances of making a hit would increase some- 
what with practice. If a different man were used each time, p would 
undoubtedly change from trial to trial. Possible deviations in the basic 
assumptions should be taken into account when interpreting a resulting 
probability such as .608. 

5.2.3.2 Binomial Moments. The first two moments of the biuomial 
distribution will be needed shortly; therefore consider their computation. 
In order to illustrate the two methods for computing moments^ these 
moments are calculated directly from definition and indirectly by means 
of the moment generating function. 

If (1) is applied to (8) and if a few algebraic manipulations are made, 
it will be seen that 


x=o x] {n 


x)\ 




y X pv ® 

X=1 xl(n-x)r- 

^ n ^ 

= 2 pV 

»=i (x — 1)! (n — x)\ 


( 10 ) 



INTRODUCTION TO MATHEMATICAL STATISTICS 


If n and /? are factored out, this becomes 

H = np^ (iZlil] 

(* - 1)! (n - *)! 

Letting y = x — \, the right side can be written 

(n - 1)! « 

f^ = np 'I p!— - ^ 

y=(i 2 /! (n ~ 1 - y)\ 

But by (8) the quantity being summed is the probability of y successes in 
72—1 trials. Since the sum is over all possible values of y, the sum must 
equal one; hence (jl == np. 

The second moment is calculated in a similar manner by using the 
identity = x{x — 1) + x. From (1) and (10), it follows that 


^ x\{n-x)\ 


pc^n-<c 


= 2 [<* - 1) + *] — 

x=Q xl^n — x)] 

= !*(*- 1) — - p°q’"~’‘ + 

x=o xl (n — x)l 

Since the terms for a; = 0 and x = I are equal to 0 because of the factor 
x(x — 1), the summation can begin with x = 2; hence 


= 2 *(* - 1) r, fq^-^ + 1* 

x -2 x\ (n — x)\ 

If 72(72 — l)p^ is factored out, this becomes 

Ma' = n(n - 1)/ T ■ 

^ ^^2{x-2)\(n-xy/ 

Letting 2 : == a? — 2, the right side can be written as 


p*- V* + 


= n(n - l)p* 2 pV ® * + a* 

3 = 02 ! (n — 2 — z)\ 

The quantity being summed is the probability of 2 successes in 72 — 2 
trials. Since the sum is over all possible values of 2 :, its value must be one. 
Using this result and the earlier result that p = np, reduces to 

(11) p 9 = n{n — l)p2 + np 



THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 89 

If formula (3) is applied to the results just obtained for the binomial 
distribution, 

/ig = n{n — l)p^ + np -- n^p^ 

= --np'^ + np 
^npq 


These calculations show that the mean and the standard deviation of a 
binomial distribution are given by the formulas 


( 12 ) 


p = np 
or = V npq 


Now consider the computation of these moments by means of the 
moment generating function. If (4) is applied to (8), 




n! 


a:=0 X\{n — X)l 


p^q^- 


nl 


(perq^- 


=0 x\ (n — x)l 


But from (9) this sum can be written as a binomial raised to the wth power 
because the expansion is purely algebraic and need not be interpreted in 
terms of probabilities. Hence 


(13) M,(0) = (q + pey 

The desired moments may be obtained by applying (6). If (13) is differen- 
tiated twice with respect to 0 and terms are combined, 

M'(0) = npe\q + 
and 


M'\6) = npe\q + pe^ \q + npe^) 

The values of these derivatives at 0 = 0 are np and np(q + np), respec- 
tively; hence they are the values of p and p^ respectively. If q is replaced 
by 1 — p, it will be observed that p^ here agrees with the value obtained 
in (11). For this problem the moments are easier to obtain indirectly by 
means of the moment generating function than directly from dehnitionr 


5*2.4 Poisson Bish-ibution 

■ .. . ■ ■ ■ 

If the number of trials n is large, the computations involved in using 

formula (8) become quite lengthy; therefore, a convenient approximation 
to the binomial distribution would be very useful. It turns out that for 



90 


INTRODUCTION TO MATHEMATICAL STATISTICS 


large n there are two well-known frequency functions that give good ap- 
proximations to the binomial frequency function: one when p is very 
small and the other when this is not the case. The approximation that 
applies when p is very small is known as the Poisson frequency function 
and is defined by 

(14) Poisson Distribution: f{x) = 

x\ 

It will presently be seen that the parameter [x is the mean of the distribution; 
hence, it is proper to label it Although the Poisson distribution is 
being introduced here as an approximation to the binomial distribution, 
it is a well-known and useful distribution in its own right and therefore 
should not be regarded as merely an approximation for the binomial 
distribution. It has been named after another pioneer in the theory of 
probability. 

5.2.4.1 Poisson Approximation to the Binomial. In order to verify the 
fact that (14) does serve as a good approximation to the binomial distri- 
bution for very large n and very small p, consider what happens to the 
binomial frequency function when n becomes infinite and p approaches 
zero in such a manner that the mean p = np remains fixed. 

First, rewrite (8) as follows: 

/(^) = (» - ^ - p)n-. . 

xl 

If numerator and denominator are multiplied by and the indicated 
algebraic manipulations are performed, 

(15) fix) = "(« + 

n®*! 

= Kn - !)• • -jn - X + 

n • n ‘ • n x\ 






THEORETICAL FREQUENCY j^isTRmUTlOm 9 1 

Now, from the definition of 

lim (1 + zy := e 

z-^O 

hence, letting z = —p, 

lim [(1 = e-/' 

p ->0 

Furthermore, 



because 0 as « oo when np ju is fixed. By applying these two 
results to the right side of (15), it will be seen that 

lim fix) = ^ 

x\ 

This result may be expressed as a theorem. 

Theorem 1 : If the probability of success in a single trial p approaches 
0 while the number of trials h becom^^ a manner that the 

mean p = np remains fixed, then the binomial distribution will approach the 
Poisson distribution with mean p. 

Figures 1 and 2 were constructed to indicate how rapidly the bihomial 
distribution approaches the Poisson distribution. The broken linfes rep- 
resent the fixed Poisson distribution for p chosen equal to 4 and the 
solid lines the binomial distribution for /? = and y == ^especri^^^^ 

It appears from inspecting these graphs that the Poisson approximation 




92 


INTRODUCTION TO MATHEMATICAL STATISTICS 



should be sufficiently accurate for most applications if n ^ 100 and 
p ^ .05. 

5.2.4.2 Applications. As an illustration of the use of the Poisson dis- 
tribution as an approximation to the binomial distribution, consider the 
problem of calculating the probability that at most five defective fuses 
will be found in a box of 200 fuses if experience shows that 2 per cent of 
these fuses are defective. Here [i = np = 200(.02) = 4; hence, using 
(14), the approximate answer is given by 


P{x ^ 5} = 


5 


2 

w=f 


--44a: 

X\ 




42 43 44 



= .785 


Lengthy calculations using (8) yield the answer .788 ; hence the approxi- 
mation is very good here. 

As an illustration of an empirical distribution that may be thought of 
as possessing Poisson characteristics, consider the data of Table 1 on the 
distribution of yeast cells in the 400 squares of a hemacytometer. 

The procedure for obtaining the observed frequencies consists in diluting 
the yeast cells in a liquid, thoroughly mixing the dilution, filling a counting 
chamber that has been ruled into 400 squares with the mixture, and then 
counting the number of yeast cells on each square under a microscope. 
It is possible to conceive of these data as having come from a binomial 
population by reasoning in the following manner. If the mixture is 
thought of as consisting of yeast cells and groups of molecules of the 
liquid about equal in size to the yeast cells, the yeast cells will constitute 
only a very small percentage of such units of volume; nevertheless, the 



THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE; 93 


total number of such units on one square of the hemacytometer is so 
large that several yeast cells may be found among them. The number of 
trials here corresponds to the total number of units on. a square, ^ to the 
number of successes corresponds to the number of yeast cells -on the 
square. If the mixing has been thorough, one would expect the yeast 
cells to be distributed uniformly throughout the mixture and the units on 
a square to constitute a set of independent trials. 


Table 1 


No. cells (x) per square 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Observed frequency 

lamaiBBiniD 

0 

Expected frequency 

107 

141 

93 

41 

14 

4 

1 

0 

0 

0 

0 


The mean of > for the empirical distri^^ given in Table 1 will be 
found to he X 132. If it is , a^^^^ pn the basis ^of the preceding 

discussion that possesses a Poisson distribution and if the value of 
is approximated well by x, the frequencies that would be expected here 
may be obtained to a good approximation from (14) by computing the 
successive values of 



The results of such computations correct to the nearest unit are given in 
the third row of Table 1. There appears to be excellent agreement here. 
By the expected frequency for a given value of a; is meant the mean number 
of successes for that value of ^ problem is treated as a binomial 

problem in which /z = 400 and p = hence in which (16) gives 

np for the binomial problem when fi = 1.32. 

If there had been poor agreement between the observed and expected 
frequencies here, the Poisson model would have been considered un- 
acceptable. Since any errors introduced by replacing the unknown ^ by 
its sample estimate ^ would be very small because x is based .on 400 
observations and the Poisson approximation to the binomial niodel as 
described above is certainly excellent, any disagreement between observed 
and expected frequencies would have been attrib^^^ 
assumptions not being satisfied. Thus, if the yeast cells had hot been 
mixed thoroughly, or if there had been a tendency for the yeast cells to 
cluster, the binomial assumptions would have been questioned- Since 
experience has shown that the Poisson m^ a valid 





94 


INTRODUCTION TO MATHEMATICAL STATISTICS 

techniques of this kind, the Poisson distribution can be used to check on 
the soundness of these techniques. 

The preceding illustration is an example of a spatial type distribution. 
Variables distributed over time or space can often be assumed to possess 
Poisson distributions. For example, the Poisson distribution has been 
found to be a satisfactory model for the number of disintegrating atoms 
from a radioactive substance, or for the number of telephone calls on a line, 
in a fixed time interval. The number of meteorites found on an acre of 
desert land is another spatial variable to which the Poisson distribution is 
applicable. 

If one assumes that the number of events occurring in a time interval 
is independent of the number that occurred in earlier time intervals arid 
one makes a few other plausible assumptions, then it can be shown that 
the number of occurrences will possess a Poisson distribution. The 
same type of derivation can be applied to the number of events occurring 
in a region in space. Thus the Poisson distribution is a useful distribution 
independent of its use as an approximation for the binomial distribution. 

In Chapter 3 it was stated that point estimates of parameters would 
usually be obtained by applying the maximum likelihood principle given 
by (1 1), Chapter 3. In the preceding illustration of how yeast cells are 
distributed, x was used to estimate the parameter of a Poisson distribu- 
tion. To verify that x is the maximum likelihood estimator of //, calculate 
the likelihood function using (10), Chapter 3, and (14). Thus 

x„\ 


n 




1 


n 



Taking logarithms and differentiating, 


d log L 
dfi 


-n + 


n 



1 


The maximum likelihood estimator jl is given by setting 
0 and solving for which gives 


d log L 
dfi 


equal to 


n 




■ theoretical ..aE(^ENCY:,,DISTHBMTIpM.^^^^ .55,, 

5.3 Coiitiuuous Variables 

In the preceding sections two particular discrete frequency functions 
were studied. In the next few sections particular continuous fr^uency 
functions will be studied. Since it will be ^ to calculate the 

monients of these the definition of the fcth mqmj^nt for 

continuous distributions is considered ■ 


5.3.1 Moments i- 

Let f{x) be a continuous frequency function which is zero outside 
finite interval (a, h). Figure 3 gives the graph of such a function. Let 
the interval (^z, b) be divided into n equal subintervals and let be the 
midpoint of the /th subinteryal, Form 

(17) 2 *//(*,) A* 

: = 1 

where I^x is the width of a subinteryal, The quantity f{x^i^x is the 
area of the shaded rectangle; hence Kx represents the approxi- 
mate J:th ntomentoL^fe^ about the origin and (17) 

represents the suni of such approximate kih monients of area,'^^^ 
the rectangles approximate the area under the curve, the natural pro- 
cedure is M^d^ rnoment qf f{x) as Jhe limit of this sunt as the 

width of the subinterval approaches 0. Thus the'fcth moment of’ a con- 
tinuous distribution with frequency function f{x) is defined by 



It is often derirable t^^^^ moments of a function of x, say g{x), 

rather than of ^ itself. For example, if g{x) =: x — pi, then the /cth mo- 
ment of g(x) would be the /:th pioment 

definition in terms of an arbitrary function g{x) will enable one to shift 




96 


INTRODUCTION TO MATHEMATICAL STATISTICS 


from moments about the origin to moments about the mean and also 
to consider various other useful changes of variable. Such a definition is 
the following: 

(19) Definition: Iff(x) is the frequency function of the random variable 
the kth moment of the function gix) is given by 

= I dx 

^ — CO 

If f{x) is positive for all values of x, the limits — oo and oo are required; 
however, if f(x) is zero over part of the x axis, there is still no harm done 
in using these limits. 


5,3,2 Moment Generating Function 


The moment generating function for a continuous variable is defined 
by analogy with (4) to be 

(20) MJ,d) = [" fix) dx 


If e^^ is expanded in a power series and if the integration is performed 
term by term, it will be found that MfS) will assume the same expanded 
form as that in (5); hence (20) generates moments in the same manner 
as (4) does. 

In order to be able to generate moments of the type given by (19), it is 
necessary to generalize the definition of the moment generating function. 
From the manner in which MJQ) generates moments, it is clear that 
moments of g(x) will be generated if e^^ is replaced by in (20). The 
desired definition is the following: 


(21) Definition: Iff(x) is the frequency function of the random variable 
X, the moment generating function of gix) is given by 

J — cc\ 

This generalized form of the moment generating function is used to 
derive a number of theorems, but in such derivations two properties of 
moment generating functions are needed; therefore, consider those 
properties now. 

Let c be any constant and let hix) be a function of x for which the moment 
generating function exists. Then, since g(x) in (21) represents an arbitrary 
function, g(x) may be chosen as gix) = chix ) ; consequently. 






THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 97 
The second property is obtained by choosing g(x) = h(x) + c. Then 

= f* da: 

J-OO 

= da: 

If h(x) is replaced by g(xX these results may be summarized in tw6 impor- 
tant formulas. t 

(22) Properties: If c is any constant and g{x) is any function for which 
the moment generating function exists^ 

(0 :: 

(it) = 

These two properties enable one to dispose of a bothersome coiistant c 
which multiplies, or is added to, a function g(x). By replacing integrals 
by sums it is easily shown that these formulas apply to discrete variables 
also. It is assumed that g(x) and f(x) are such that the integral in (21), 
or the corresponding sum, is finite. This implies that all the mornents of 
g{x) are finite. Applications of the preceding formulas are mad^ in the 
following sections. 

5.3.3 Rectangular Distribution 

Perhaps the simplest continuous frequency function is the one: that is 
constant over some interval (a, b) and is 0 elsewhere. This frequency 
function defines what is known as the rectangular or uniform distribu- 
tion; hence 

. . ' ■ V: ' ■ 

(23) Rectangular Distribution: f(x) = ‘e^wh^e 

The graph of a typical rectangular distribution Is given in Fig. 4. ' 

\m 


6 — 0 


a h 

Fig. 4. A rectangular distribution. 



98 


INTRODUCTION TO MATHEMATICAL STATISTICS 


The rectangular distribution arises, for example, in the study of round- 
ing errors when measurements are recorded to a certain accuracy. Thus, 
if measurements of daily temperatures are recorded to the nearest degree, 
it would be assumed that the difference in degrees between the true 
temperature and the recorded temperature is some number between 
“.5 and .5 and that the error is uniformly distributed throughout this 
intervalv" ■ 

The fcth moment of the rectangular distribution is easy to compute. 
For example, if a = 0 and 6=1, application of (18) to (23) gives 


uJ = f dx = — - — 

Jo fc + 1 

The moment generating function is also easy to compute. Application 
of (20) to (23) gives 

If one wished to obtain the fcth moment from MJi6\ it would be necessary 
to expand and simplify as follows: 




= 1 + 1 +^+ 

2! 3! 


+ 


Qk 


(k + l)l 


+ 


Since is the coefficient of it will be seen from this expansion that 
/^k = k\l{k + 1)1 = 1/(A: + 1), which agrees with the preceding result. 
This computation was made for the purpose of becoming familiar with 
the moment generating function and not as a suggested method for com- 
puting the moments. The direct computation is obviously much simpler 
here. 

The rectangular frequency function is of somewhat limited use as a 
model for real-life distributions; however, it is of considerable theoretical 
value and is the simplest continuous frequency function on which to illus- 
trate general formulas. 


5.3.4 Normal Distribution 

The histogram shown in Fig. 1 and Fig. 5 of Chapter 4 are examples 
of distributions whose general characteristics are encountered rather 
often. These two distributions are quite symmetrical, die out rather 
quickly at the tails, and possess a shape much like that of a bell. A 



THEORETICAL FREQUENCY DTSTRIBOTIOKS OF ONE VARIAISlS 99 

mathematical model that has proved very useful for distributions such as 
these, and which presently will be seen to be very important theoretically, 
is a distribution called the normal or Gaussian distribution. It is! defined 

as ' . . . ■ • 

(24) Normal Distribution: f(x) = ce ^ ^ 

Here a, by and c are parameters that make f(x) a frequency function. 
For example, c must be such that the area under the graph of f(x) 'h equal 
to one. 

5*3.4.1 Moments. The graph of a typical normal curve is ^iven in 
Fig. 5. From (24) it is clear that the curve is symmetrical about the line 
X = a; hence by symmetry the mean must be given by ^ J; 

Instead of finding the moments directly, they may be found indirectly 
by means of the moment generating function. Furthermore, sifice it is 
easier here to find moments about the mean than about the origin, con- 
sider the evaluation of From definition (21), with ^(a;) chosen 

equal to a? — 

fc Ip-Mg 

M,_,^(e) = c 

Let z = {x “ pi)lb\ then dx ^ b dz and 

rco 6bz~ 

M,_^(e) = bc\ e ^dz 

J— 00 

Complete the square in the exponent as follows: 

ebz-~ = - -(z - dhf + - 

2 2 2 



Fig. 5. Typical normal distribution. 



100 

Then, 


INTRODUCTION TO MATHEMATICAL STATISTICS 


If I = 2 — 66, then dz = dt and 


= bce^ i ^ ^ 

J — 00 

= dt and 

102^2 /'OO 

J — oo 


The value of this integral can be found in any standard table of integrals. 
Or it may be evaluated directly by the following device. Let 

e ^dt 


Then 


P 


h : 

/*oo _*! /'oo 

= e ^dx e ^ 
JO Jo 


dy 


dx 


0 Jo 


In polar coordinates this double integral assumes the form 

/’it/2 {* c ( 

P = 

Jo Jo 


(*7x12 (*co 

e dr dd 
Jo 


■ j : 




de 


1 

=/: 
£ 


e ^dt = a / 27 ; 






Hence 

and 

(25) 

From (5) it follows that for any moment generating function M(0) = 1 ; 
hence from (25) it follows that Vlirbc = 1 and that 

(26) M,_,(e) = 

If this exponential is expanded in a power series, 


M,_„(0)=r l + + 



THEORETICAL FREQUENCY mSTRIBlTTIOKS QF ONE VARrABL)E 101 

Since the odd powers of 0 are missing, the odd moments of x about its 
mean ju mmt be 0, which of course is true for any symmetrical distribu- 
tion possessing such moments. The coefficient of 0^/2! is th| second 
moment of a? about its mean; therefore, = ox b = a. Since 

V hr be = 1, c = l/nV Irr; consequently, (24) can be written in the form 

This result shows that a normal distribution is completely determined 
by specifying its mean and standard deviation. It should be ncited that 
the only difference between (24) and (27) is that the parameters in (24) 
have now been reduced to two independent parameters which have been 
given statistical meaning. 

A formula for A4(0), expressed in terms of statistical parameters, will 
be needed in subsequent sections. It can be obtained from (26) by replacing 
b^ with and using the second of the two properties in (22) with ^(x) = x 
and c =s — These substitutions yield the result 


W) 


^ e ^ 


For the purpose of interpreting the standard deviation geometrically, 
consider the points of inflection of a normal curve. When (27j) is dif- 
ferentiated twice. 




r = - 




From the first derivative it is clear that there is but one maximum point, 
which occurs at a? = From the second derivative it follows that points 
of inflection occur at a? == ± u. Geometrically, then, the standard 

deviation is the distance from the axis of symmetry to a point of inflection. 

in Chapter 4 meaning was given to the standard deviation as a measure 
of variation by stating that for histograms approximating a uormal 
curve the interval ^ dt *5 included about 68 per cent of the data ; ^ ± 23 
included about 95 per cent of the data. This property will now be verified. 
For (27) the probability that x will fall in the interval ex is given by 




102 INTRODUCTION TO MATHEMATICAL STATISTICS 


When t — {x then dx = a dt and 

1 1 


fx~a 




Vit 


ri J 1 

!-l V 277 Jo 


^dt 


The value of the last integral multiplied by the factor 1 /V 2tt, found in 
Table 11 in the back of the book, is .3413. Hence the value of the desired 
integral is .68, correct to two digits. For the limits ju dz 2a one may verify 
that t = ±2 and that the area between is .95. The unit of measurement 
given hy t = (x — ju)/a is called a standard unit. Table II is therefore a 
table for the normal distribution with 0 mean and unit standard devia- 
tion, that is, for standard units. 

5.3.4.2 Fitting to Histograms. Consider the problem offitting a normal 
curve to a histogram. If one has reasons for believing that a set of data 
represents a random sample from some normal population, then the fitted 
normal curve would serve as an approximation to the population curve. 
Since a normal distribution is completely determined by its mean and 
standard deviation and these quantities can be rather accurately estimated 
for n fairly large, one would have considerably more confidence in the 
fitted normal curve as representing the population distribution than in 
the histogram of the data as doing so. There is not much occasion to fit 
normal curves to histograms. Frequency curve fitting is important in 
some statistical fields; however, for most statistical purposes it is princi- 
pally an exercise to acquaint the student with the normal curve and with 
the extent to which normal data are found in statistical practice. 

As an illustration of the technique of fitting a normal curve to a histo- 
gram, consider once more the data of Table 2, Chapter 4, for which the 
histogram is shown in Fig. 5, Chapter 4. These data are also given in 


Class 

a: - 475 

boundaries 

151 

X 

t 

99.5 

-2.49 

199.5 

-1.82 

299.5 

-1.16 

399.5 

-0.50 

499.5 

0.16 

599.5 

0.82 

699.5 

1.49 

799.5 

2.15 

899.5 

2.81 

9QQ 5 

^47 


Table 2 


Area to 

Area for 
interval to 

left of t 

left of t 

A 

^A 

.0064 

.0064 

,0344 

.0280 

.1230 

.0886 

.3085 

.1855 

.5636 

.2551 

,1939 

.2303 

.9319 

.1380 

.9842 

.0523 

.9975 

.0133 

gQQ7 

0097 


Theoretical 


frequency 

Observed 

n^A 

frequency 

6.4 

6 

28.0 

28 

88.6 

88 

185.5 

180 

255.1 

247 

230.3 

260 

138.0 

133 

52.3 

42 

13.3 

11 

9 9 

5 



THEOREnCAL FREQUENCY mmiBUTrom 'OFl^ 103 


f 



49.5 149.5 249.5 349.5 449.5 549.5 649.5 749.5 849.5 949;5 
Fig. 6. Normal curve fitted to histogram. 


Table 2 of this chapter, and the histograni is shown 6. ; Since 

and O' are unknown, they must be estimated from the data. The irnethods 
explained in Chapter 3 show that the maximum likelihood estimators of 
^ and a are given by /? == ^ and u = 5'; therefore, the estimates ;? = 475 
and s = 151 are used. Then, by (27), the desired fitted normal frequency 
function is 

l/a;-475\2 
^ 2V 151 / 


(29) 


/(*) = 


151^277 


The graph of this function, of course, has unit area, hence /(ir) miust be 
multiplied by the total area of the histogram if it is to fit the histogram. 
However, except for the purpose of seeing how well the curve jits, if i^ 
not necessary to calculate ordinates, since the agreement between the 
fitted curve and the histogram is determined by comparing the corre- 
sponding areas under the curve and the histogram for the various class 
intervals. In the fitting technique it is therefore convenient to work with 
percentage areas under the normal curve. These percentage areas for 
the various class intervals of the histogram are calculated systematically 
by starting with the first class interval. Now, to any value of a?, say % 
for the curve (29) there corresponds a value == (^o ~ 475)/l5^^ 
standard normal curve 

y , 

2 ' 



( 30 ) 



104 


INTRODUCTION TO MATHEMATICAL STATISTICS 


such that the percentage of area to the left of in (29) is the same as the 
percentage of area to the left of in (30). For, since t = (x ^ 475)/151, 
dx = 151 dt and 

l/a;-475y 

^ 2 \ 151 ) Po ^ 2 

— dx = — = dt 

151V 277 •/-«> V2 

The value of this integral can be obtained from Table II. The procedure 
for finding these normal curve frequencies is illustrated in Table 2. 

The agreement seems to be excellent except for the rather large dif- 
ference between 230.3 and 260. The extent of such discrepancies is more 
readily realized by comparing the graphs of the histogram and the fitted 
normal curve as shown in Fig. 6. The question whether the fit may be 
considered satisfactory is considered in a later chapter. 

5.3,4.3 Applications. The interesting and important applications of 
normal distributions are considered in later chapters after further essential 
theory has been developed. Here, only one simple illustration of its direct 
applicability is given. 

Many college instructors of large classes assign letter grades on examina- 
tions by means of the normal distribution. The procedure followed is 
to ignore that part of the distribution lying outside the interval fji ± 2.5or, 
or ih 30*, and then divide this interval into five equal parts corresponding 
to the letter grades F, D, C, B, and A. If ± 2.5o’ is used, each interval 
will be a units in length; consequently, the six values of x determining 
these five intervals will he /jl — 2,5a, ^ — 1.5or, ^ — 0.5o', jx + 0.50“, 
^ + \.5a, and ^ + 2.5o'. The corresponding values of / = (a; — fjL)ja 
will be —2.5, —1.5, —0.5, 0.5, 1.5, and 2.5. From Table II it will be 
found that the areas within these five intervals are .06, .24, .38, .24, and 
.06, respectively. Since these percentages do not total 100 per cent, it is 
customary to allow the two end intervals to extend to infinity. Then the 
percentages of students who will be assigned the corresponding letter 
grades are 7 per cent F, 24 per cent i), 38 per cent C, 24 per cent B, and 
7 per cent A, 

53 A A Normal Approximation to Binomial. In 5.2.4. 1 the Poisson 
distribution was introduced as an approximation to the binomial distribu- 
tion when n is large and p is small. It was stated there that another dis- 
tribution gives a good approximation for large n when p is not small. 
The normal distribution is the distribution with this property. Before 
investigating the nature of this approximation in general, consider a 
numerical example. 

Let n = 12 and | and construct the graph of the corresponding 
binomial distribution. This is hardly a large value of n, so that a good 



THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 105 


normal approximation is not to be expected here. Since fix) is to be 
computed for all values of x from 0 to 12, it is easier to compute each 
value, after the first, from the preceding one rather than to compute each 
value by itself. Here, by (8), 


/(*) = 


12 ! 

x\ (12 — x)\ 



It is easily verified that for this frequency function 


X + 1 2 , 

After /(O) was computed, this relationship was used to obtain the following 
values. 

/(O) = mmi fil) = 1/ (6) = .047687' 

/(l) = 6/(0) = .046242 /(8) = */ (7) = .014902’:; 

/(2) = ^(1) = .127166 /(9) = 1/ (8) = .003312;‘ 

(31) /(3) = 1/(2) = .211943 /(lO) = */ (9) = .000497;' 

/(4) = 1/(3) = .238436 /(ll) = /r/(l0) = .000045;: 

/(5) = f/(4) = .190749 /(12) = */(!!) = .000002' 

/(6) = iV(5) = .111270 

Since /(O) was computed correct to four digits only, the remaining values 
would not be expected to be correct to more than four digits, even though 
they have been recorded to six decimals for the sake of appearances. 
The graph of this binomial distribution is shown in Fig. 7. It "appears 
that this histogram could be fitted fairly welf b/fhe proper normal curve. 




106 INTRODUCTION TO MATHEMATICAL STATISTICS 

Since a normal curve is completely determined by its mean and standard 
deviation, the natural normal curve to use here is the one with the same 
mean and standard deviation as the binomial distribution. Hence, 
because of (12), choose 

^ = 12 • i = 4 

and 

ff = = 1.63 

As a test of the accuracy of the normal curve approximation here and 
as an illustration of the use of normal curve methods for approximating 
binomial probabilities, consider a few problems related to Fig. 7. 

If the probability that a marksman will hit a target is J and if he takes 12 
shots, what is the probability that he will score at least six hits? The 
exact answer is obtained by adding the values of f{x) from x = 6tox = 12, 
which, by using (31), is .178, correct to three decimal places. Geometri- 
cally, this answer is the area of that part of the histogram in Fig. 7 lying 
to the right ofx = 5.5. Therefore, to approximate this probability by 
normal curve methods, it is merely necessary to find the area under that 
part of the fitted normal curve which lies to the right of 5.5. Since the 
fitted curve has // = 4 and a = 1.63, it follows that 


But, from Table II, the area to the right of / = 0.92 is .179, which, com- 
pared to the correct value of .178, is in error by only about | per cent. 

To test the accuracy of normal curve methods over a shorter interval, 
calculate the probability that a marksman will score precisely six hits in 
the 12 shots. From (31) the answer correct to three decimals is/(6) = .111. 
To approximate this answer, it is merely necessary to find the area under 
the fitted normal curve between x = 5.5 and x = 6.5. Thus 

6 5 — 4 

U = = 1.53, Ao = .4370 

1.63 

u = = 0.92, Ai = .3212 

1.63 

Therefore the required area is .116, which is in error by about 5 per cent. 
From these two examples it appears that normal curve methods are quite 
accurate, even for some situations such as that considered here in which 
n is not very large. 

Thus far the jFact that the binomial distribution can be approximated 
well for large n by the normal distribution with fx = np and a = V npq 



THEORETICAL FREQUENCY 107 

has been made plausible by numerical examples. Now consider the 
verification of this fact by means of the moment generating function. 
Here it is convenient to use the variable 

a 

From properties (22) and (13), it follows that ’ 

M^id) = 

■ -Hi 2 



Taking the logarithm of both sides to the base e gives 

: ■ UB ' t 

1 H jT /ti\ • I 1 /■ „ I . n\ 


log M0) 


+ n log (q + pe°) 


Expanding e” and replacing ^ ^ by 1 yields 


1 m . 1 (B\ 


If n is chosen sufficiently large, a = V npq can be made so large that for 
any fixed value of d the sum of the series in brackets will be less than 1 
in absolute value. If/? times this sum is denoted by s, then for sufcciently 
large n it follows that \z\ < 1 . Since the logarithm on the right "may be 
treated as of the form log {1 + z}, where \z\ < 1, the expansion 


log {1 + z} = z 
may be applied to give 

(32) logM,(0)=-^ 


2 3 


Collecting terms in powers of 6 gives 


2LW 2\\a. 


+iW + ...T 





108 


INTRODUCTION TO MATHEMATICAL STATISTICS 


But, since np = [x and = npq, the coefRcient of 6 vanishes and the 
coefficient of 6^/2 ! reduces to 1 ; consequently, 

log Mf.(6) = “ + in 0^ fc = 3, 4, • • • 

From an inspection of (32), which shows how terms in arise, it is clear 
that all terms in 6^ contain nja^ as a common factor. The other factor 
for each such term is a constant times a power of /?. Since this other 
factor does not involve n and since 

n n 

- 

{npqf 

with k >3 here, all such terms will approach zero as n becomes infinite. 
This implies that 

02 

lim log M^{^) = — 

n-*co 2 

which in turn implies that 

£! 

(33) lim M,(0) = 

n->oo 

A justification of these expansions and limits would require a knowledge 
of advanced calculus methods and therefore is not considered here. 

Now consider the moment generating function of a normal variable 
as given by (26) with b = a. Using property (22) once more, it follows 
that for a normal variable 

= =e^ 

A comparison of this result and (33) shows that the modified binomial 
variable t {x — np)lV npq has a moment generating function that 
approaches the moment generating function of the normal variable whose 
mean is 0 and whose standard deviation is 1. This implies that all the 
moments of this variable approach those of the standard normal variable. 

In order to complete this discussion, it is necessary to introduce two 
very important theorems of advanced theoretical statistics. 

The first theorem states that a distribution function is uniquely deter- 
mined by its moment generating function when it exists. For example, 
if the moment generating function of some variable z is found to be 
then z must be a standard normal variable. 

The second theorem states that if a variable, which depends upon w, 
has a moment generating function that approaches the moment generating 
function of a second variable, then the distribution function of the first 



THEORFTICAr b OF ONE VARIABLE 1(39 

variable must approach the distribution function of the second variable 
as « — > 00 , 

The preceding theorems insure that the distribution of the modified 
binomial variable (x — np)lV npq will approach that of a standard normal 
variable because by (33) its moment generating function approaches the 
moment generating function of a standard normal variable. A precise 
statement of these two theorems, including conditions under which they 
hold, is not made here; however, th6se theorems are used op several 
occasions, A direct application of these theorems to (33) yields the follow- 
ing important result. 

Theorem 2: If x represents the number of successes in n independent 
trials of an event for which p is the probability of success in a single trial, 
then the variable (x — np)lV npq has a distribution that approaches the 
normal distribution with mean 0 and standard deviation 1 as the, number 
of trials becomes increasingly large. 

This theorem justifies the use of normal curve methods for approxi- 
mating probabilities related to successive trials of an event when w is 
large. Experience indicates that the approximation is fairly good as long 
as np > 5 when p < and > 5 when p > h A very small value of p, 
together with a moderately large value of n, would yield a small mean 
and thus produce a skewed distribution. Similarly, if /? is very close to 
one and n is only moderately large, ffibst of the distribution willibe piled 
up close to X == «, thus preventing a normal curve from fitting well. If 
the mean is at least five units away from either extremity, the distribution 
has sufficient room to become fairly symmetrical. Figures 8 anh 9 indi- 
cate how rapidly the distribution of the variable {x — np)lV npq approaches 




110 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 9, Binomial distribution of (x — np)lV npq for /? = i and « = 48. 

normality when p == h and « = 24 and 48, respectively. The common y 
scale for these two graphs is approximately 17 times that for the x axis. 

There are numerous occasions when it is more convenient to work with 
the proportion of successes in n trials than with the actual number of 
successes. Since 

X 

P 

X — np n 

the following useful corollary to Theorem 2 may be obtained. 

(34) Corollary : The proportion of successes xjn will be approximately 
normally distributed with mean p and standard deviation Vpqjn if n is 
sufficiently large. 

The two approximations that have been considered for the binomial 
distribution, namely the Poisson and normal distributions, are sufficient 
to permit one to solve all the simpler problems that require the compu- 
tation of binomial probabilities. If n is small, one uses formula (8) 
directly because the computations are then quite easy. Tables of fac- 
torials and logarithms are helpful here. If n is large and p is small or 
large, the Poisson approximation may be used. If n is large and p is not 
small or large, the normal approximation may be used. Thus all possi- 
bilities have been covered. 

5.3.4.5 Applications. Certain types of practical problems dealing with 
percentages can be solved by means of the normal approximation to the 
binomial distribution. As a first illustration, consider the following genet- 
ics problem. According to Mendelian inheritance theory, certain crosses 



rilLORETICAr. FREQUENCY t5rSTRIfiCTr6NfS QF QN'E' VAfttAnfljE 'H I 

of peas should give yellow and green peas in a ratio of 3 : 1 . In ah experi- 
ment 176 yellow and 48 green peas were obtained. Do these copform to 
theory? ^ 

This problem may be considered as a problem of testing a statistical 
hypothesis. The 224 peas may be treated as 224 trials of an experiment 
for which the probability of obtaining a success^ that is a yellow pea, in a 
single trial is |. Thus the number of yellow peas x is treated as a bi- 
nomial variable and the hypothesis to be tested is 

Under I/q, 

fx = np = 168 and cr = V npq = 6.5 

From the experimenter’s point of view an experiment corroborates theory 
if its results are sufficiently close to expectation. In this problem it is 
therefore a question of deciding whether 176 is sufficiently close to 168. 
Since poor experimental results correspond to large deviations from the 
mean, whether positive or negative, the experimenter would naturally 
choose the two tails of the binomial distribution for his critical region. 
If a critical region of size .05 is selected, which is the size that Ip almost 
always selected in this book, it is necessary to determine how far out on 
the tails of the binomial histogram to go so that the areas oFthe two 
extremities will total .05. Since « is sufficiently large to yield an excellent 
normal curve approximation to the binomial histogram, it will suffice 
to determine how far out on the tails of the fitted normal curve to go so 
that the areas of the two extremities will total .05. Because \ t \ = 12, where 
t z= (x ^ p)la, corresponds to an interval of two standard deviations on 
both sides of the mean and this interval includes 95 per cent, to the nearest 
per cent, of the normal curve area, it is customary to go out a distance of 
two standard deviations to determine the desired critical region rather 
than the more accurate Table If value of |i| == 1.9 For this probiem, 
therefore, the critical region will consist of the two tail intervals ! 

x<p-2a==16&-l3^l55 

and .■■H. 

a; > /^ + 2 ct == 168 H- 13 - 181 

If an experimental value should fall in this critical region, the hypothesis 
Hq would be rejected, which means that the experimental value would 
not be considered as compatible with Mendelian theory. Such ah experi- 
mental value would be said to be significant because it signifies that some 
other theory is needed to explain the experimental outcome. On the 
other hand, if the experimental value should fall in the acceptance region, 
then the experimental value would be considered as corroborating the 



112 


INTRODUCTION TO MATHEMATICAL STATISTICS 


theory. Since the experimental value of 176 falls within the acceptance 
interval of 155-181, there is no reason on this basis for doubting that 
Mendelian inheritance is operating here. 

In solving this problem, the mathematical model selected was the bi- 
nomial frequency function with n = 224 and p = f. The normal fre- 
quency function was used only as an approximation to determine the 
critical region for testing the hypothesis 

Because of the nature of his problem, the experimenter would un- 
doubtedly choose as his alternative hypothesis 

H^:p ^ i 

This type of alternative is similar to that considered in 3.2,2, which gave 
rise to the power function. The approximate power function could be 
obtained here by using the normal approximation to the binomial. By 
means of the power function one could tell how effective the choice of 
the two equal tails as the critical region is for detecting various possible 
alternative values of p. It can be shown by methods that are considered in 
Chapter 9, that there is no choice of critical region that is best for all 
possible alternative values of p. However, it can also be shown that the 
choice of the two equal tails is an excellent compromise, thereby justifying 
the selection made on intuitive grounds. 

As a second illustration, consider the following problem. From past 
experience the manufacturer of parts finds that when a machine is func- 
tioning properly 5 per cent of the parts are defective on the average. 
During the course of a day’s operation by a certain operator 400 parts 
are turned out, 30 of which are defective. Is the operator running the 
machine properly ? 

The answer to this question depends upon what is meant by the word 
properly. Here it will be assumed that properly means that the number of 
defective parts should not be greater than what could be reasonably 
attributed to chance for a normal operator. If the operator in question 
is considered normal, the 400 parts may be thought of as 400 trials of an 
experimentfor which the probability of obtaining a defective part in a single 
trial is ,05. The number of defective parts x is treated as a binomial 
variable and the problem then becomes a problem of testing the statistical 
hypothesis 

Hq:p = .05 

Since the employer would be interested only in knowing whether this 
operator is normal, as contrasted to being worse than normal, he would 
be interested in knowing what the probability is that a normal operator 
will turn out 30 or more defective parts in a lot of 400. This probability 



THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 113 

could be obtained by using (8) with w = 400 and p = .05 to calculate 
the successive values of /(30), /(31), • • • , /(400) and then adding these 
probabilities. It is much easier, to put it mildly, to approximate the sum 
of these probabilities by finding the area to the right of 29.5 under the 
approximating normal curve. Since 

jx =s= np = 20 and o' = V npq == 4.36 

it is merely necessary to find the area in Table II to the right of the value 
f ^ 2.18 

a 4.36 

This area is .015; consequently, the probability is approximately .015 
that a normal operator will turn out 30 or more defective parts in a lot of 
400. Now this day’s experience may be thought of as but one of an in- 
definite sequence of similar days’ experiences for normal ojjerators. 
The result may therefore be interpreted by stating that a normal operator 
would have a day as bad or worse than this only about 15 days in every 
1000, on the average. From the employer’s point of view this operator 
has undoubtedly turned out more defective parts than can be reasonably 
attributed to chance; consequently, he would he accused of not running 
the machine properly. ■ 

The critical region for testing here would be chosen to be the right 
tail of the binomial distribution, rather than both tails as in the preceding 
illustration, because the logical alternative hypothesis here from the 
employer’s point of view is 

.05 

If, as is customary, a critical region of size .05 had been selected, the value 
of a; = 30 would have been judged significant, hence would have been 
rejected in favor of By computing the probability of .015, however, 
it was possible to determine how small a type I error could have been 
used and stiU have rejecte^^^ 

If the normah^^a^ is used, it can be shown by the methods 

of Chapter 9 that the power curve for the foregoing choice o^" critical 
region is nowhere exceeded by the power curve of any other critical region 
of size .05 for /) > .05 ; consequently, the foregoing test is the best possible, 
based on the norma! approximation. 

The practical reasonableness of the decision made in this problem 
depends upon the extent to which the mathematical model used here 
represents the actual situation. If the successive parts turned out by 
normal operators do not behave like independent trials of an experiment 
for which p is constant from trial to trial, then theoretically one is not 



114 


INTRODUCTION TO MATHEMATICAL STATISTICS 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 

Fig. 10. Control chart for fraction defective. 


justified in applying these methods, although practically they may give 
good results. It might happen, as it often does, that the variability of 
normal operators is much larger than that given by a = V npq or that 
the percentage of defective parts varies with the day of the week or the 
condition of the machine. 

As a third illustration, consider the problem just alluded to of deter- 
mining whether daily percentages of defectives may be treated as inde- 
pendent trials of an experiment for which p is constant from trial to trial. 
Industrial experience has shown that most production processes do not 
behave in this idealized manner and that much valuable information is 
obtained concerning the process if the order in which data are obtained 
is preserved: A simple graphical method, called a quality-control chart, 
has been found highly useful in the solution of this problem. Such a 
chart for the proportion of defectives is illustrated in Fig. 10. The middle 
line is thought of as corresponding to the process proportion defective, 
although it is usually merely the mean of past daily proportions. The 
other two lines serve as control limits for daily proportions of defectives. 
From (34) it will be observed that these two control lines are spaced 
three standard deviations from the mean line. The time units for successive 
samples are recorded along the x axis. If now the production process 
behaves in the idealized manner and if the normal approximation to the 
binomial distribution may be used, the probability that a daily proportion 
when plotted on this chart will fall outside the control band is approxi- 
mately equal to the probability that a normal variable will assume a value 
more than three standard deviations away from its mean, which, from 
Table II, is .003. Because of this small probability, it is reasonable to 
assume that the production process is no longer behaving properly when 
a point falls outside the control band; consequently, the production 
engineer checks over the various steps in the process when this event 
occurs. From an inspection of Fig. 10 it will be observed that the process 
in question went out of control on the twelfth day. 



THEORETICAL FREQUENCY PISTRIBUTIONS OF OME „ 

Industrial experience shows that only rarely does a production: process 
behave in this idealized manner when is first 

applied. Nevertheless, the technique is highly useful because it; enables 
one to discover causes of a lack gf 5 on the 

production process until gradually statistical control has been obtained, 
This illustratiQn and discussion^^^ quality-control chart gives an 
incomplete picture of how quality-control methods operate. Such 
methods constitute an extensw^ field of applied statistics, and numerous 
articles and books concerning them are available, 

As a final illusti'ation, consider the problem of determining how large 
a sample of university students should be taken if it is desired tO: estimm^^^^ 
the proportion of students who work part time to within .04 units of the 
true proportion. Since the accuracy of estimates cannot be guaranteed 
unless most of the population is sampled, it is customary to express the 
accuracy of an estimate by stating the probability that the error of esti- 
mate will not exceed a fixed ampunt.^^^^^^^^^^^ let the probability 

be .95 that the error of estimate a geometrical 

point of view, this means that 95 per cent of the time the estimate xjn 
should fall within .04 units of the true proportion p. Since p is obviously 
not very small nor very large here and a fairly large sample is going to be 
needed, it may be assumed that xjn is approximately normally distributed. 
Now, for a normal variable, the variable wiU faU withim uriit pOM 
mean with a probability of .95 only if .04 unit is equal to two ' standard 
deviations of the variable ; hence it is necessary that 

2(r. = .04 

But, from (34), = v pqln\ hence it is necessary that 

iVpqjn^M 

or, solving for w, that 

(35) « = 2500p9 

Since p is unknown, it is necessary to estimate /? in some manher. It is 

easy to show by calculus methods that thg function ^ p{\ ^ p) assumes 
its maximum value for p = \ \ hence, if this value of p is used, the maxi- 
mum possible sample size will be obtained, Thus, if « = 625, the sample 
will certainly be large enough. A more economical approach would be 
first to take a preliminary sample of, say, 100, estimate p from it, and then 
use this estimate in (35) to estimate total w, Fpr still greater accuracy one 
would take only part, say J, of the sample indicated by the estimated value 
of n, combine the two preliminary samples to obtain a new estimate of /?, 
and use this estimate in (35) to obtain a final estimate of tg 



116 INTRODUCTION TO MATHEMATICAL STATISTICS 

5.4 Other Distributions 


This section defines and discusses very briefly two other discrete dis- 
tributions that are important in statistical work, and in addition it dis- 
cusses the problem of transforming continuous non-normal distributions 
so that they become approximately normal. Thus the purpose of this 
section is to extend somewhat the methods of the preceding sections to a 
wider class of problems. 


5.4.1 Hypergeometric Distribution 

The binomial distribution was derived on the basis of n independent 
trials of an experiment; however, if the experiment consists of selecting 
individuals from a finite population of individuals, the trials will not be 
independent. For example, if a sample of 20 students is to be chosen from 
a group of 100 students for the purpose of studying the extent to which 
students work part time, it is clear that the probability of selecting a 
student who works part time need not remain fixed as successive individ- 
uals are selected for the sample. For large finite populations the error 
arising from assuming that p is constant and the trials are independent, 
when sampling the population, is very small and it may be ignored, in 
which case the binomial model is satisfactory. However, for problems in 
which the population is so small that a serious error will be introduced in 
using the binomial distribution it is necessary to apply a more appropriate 
distribution known as the hypergeometric distribution. It can be derived 
as follows. 

Let N denote the size of the population from which a sample of n is 
to be drawn. Let the proportion of individuals in this finite population 
who possess, say, property A be denoted by p. If x is the random variable 
corresponding to the number of individuals in the sample of n who 
possess property A, then the problem is to find the frequency function of 
X, Since the x individuals must come from the Np individuals in the 
population with property A and the remaining n — x individuals must 
come from the N — Np who do not possess the property, it follows 
from the methods illustrated in 2.8.4 that the desired frequency function 
will be given by the following formula. 


(36) Hypergeometric Distribution : 




THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE; 117 

Calculations with this formula will show that when n is only a small 
percentage of N the value of N must be quite small before there;' will be 
any appreciable difference between the values given by this forrhhla and 
the binomial formula in (8). As an illustration, suppose a population 
consists of 100 individuals, of whom 10 per cent have high blood pressure. 
Then calculations will show, for example, that the probability of getting 
at most two individuals with high blood pressure in a sample of 10 is 


P{x < 2} 



If the binomial formula (8) is used, additional calculations will show that 
one then obtains 


P[x < 2} 


2 

t 

i»=o x \ (10 — x) 


10! /I W9 
10 - :r)!\l0/ llO/ 



The hypergeometric distribution will not be needed until a later chapter ; 
however, it is introduced here to show how one can employ a 
fined model than the binomial distribution for binomial-type problems 
when the binomial assumptions are not strictly realized. 


5.4.2 Multinomial Distribution 

The binomial distribution is capable of solving only those successive 
trials problems in which each outcome can be classified as either a success 
or a failure. Problems frequently arise, however, in which it is desirable 
to have more than two categories of classification. For example, in 
studying blood types it is necessary to use four groupings in order: to treat 
such problems adequately. One can always reduce more than two cate- 
gories to only two by combining them, but this procedure is fikely to 
throw away much valuable information; therefore it would be desirable 
to have a distribution that takes account of all such categories. Such a 
distribution exists in what is known as the multinomial distribution. It is 
obtained in the following manner. 

Consider an experiment in which there are k mutually exclusive [possible 
outcomes , A^. Let be the probability that event; >4^ will 

occur at a trial of the experiment and let « trials be made. Then the 
probability that event A^ will occur times, event ^2 occur times, 

k 

etc., where 2^^ = may be calculated by using the same reasdning as 



118 


INTRODUCTION TO MATHEMATICAL STATISTICS 


that used in deriving the binomial distribution. In this connection, con- 
sider the particular sequence of events given by 


Since the trials are independent, the probability of obtaining this partic- 
ular sequence of events is 

( 37 ) • p ^k 

Now every arrangement of the preceding set of ^’s has this same prob- 
ability of occurring and satisfies the conditions of the problem; con- 
sequently, it is necessary to count the number of arrangements. But 
this is merely the number of permutations of n things of which are alike, 
0^2 are alike, etc., which by (18), Chapter 2, is equal to 


(38) 




Since all these arrangements are the mutually exclusive ways in which 
the desired event can occur and since each of them has the probability 
given by (37), the desired probability is obtained by multiplying the 
quantities given in (37) and (38). This result may be summarized as 
follows. 


(39) Multinomial Distribution: 


0 ^ 2 , ' ’ ‘ , Xi^ — 


nl 


X^l Xj,\ 


• • • P/" 


This name is given to the distribution because (39) represents the 
general term in the expansion of the multinomial function 

+ 7^2 + * ‘ * + Pi^"^ 


just as the binomial frequency function represents the general term in 
the expansion of the binomial function {q + /?)”. 

As it stands, the multinomial distribution is not very convenient for 
calculating probabilities unless n is small. The problem of finding an 
approximation here is considerably more difficult than in the case of the 
binomial distribution. As in the case of the hypergeometric distribution, 
this distribution will not be needed until a later chapter, but it is intro- 
duced here to show how the binomial distributibn model can be gener- 
alized to treat more complicated problems of the repeated-trials-counting 
type. 



THEOKETICAL frequency^ M U9 

5,4.3 Change of Variable 

The normal distributipn is a very useful model for continuous yutiables 
that possess empirical distributions resembling the one shown iniFig. 1, 
Chapter 4; however, something different is needed for empirical distri- 
butions like the one in Fig. 3, Chapter 4. There are numerous teehniqnes 
that can be used to solve. sM^^^ problems when the basic distribution 
differs from that of a normal yariablev JSpme of theni are discu|sed in 
later chapters. One of them is discussed here to point out how methods 
based on the assumption of an underlying normal distribution a^etually 
have a wider range of applicability than might otherwise be assumed. The 
technique that is explained here is that of tmM&^ the basic variable. 

Suppose one has a random yariabie function f{x) 

differs considerably from that of a normal variable. Is it possible to find 
a change of variable, say y =? h{x)^ such that the frequency function of 
y wiU be approximately normal? If one thinks of what this rrieaiis geo- 
metrically, one would surmise that the aMwer Js yes. In this connection, 
compare the graph of a particular frequency function, /(a;), given :in Fig, 
11, with that of a standard normal. g(«/), shown in the same 

sketch. 

To any value of x, say one can find a corresponding value of y, 
denoted by y^, such that the areas to the left pf t^ 
corresponding frequency curves will be equal. If one chooses a large 
nuniber of values of a; and obtains t^^ values of y by means 

of Table II, then this set of oc. and y values will yield a functiQ.n:kl r^^^^ 
tionship which may serve as a change of variable that transforms. 
non-normal /(^) i^ito the normal g{y). This relationship is a numerical 
one and therefore an approximation of the complete relationship. 

If the complete relationship y = h{x) were known, one could transform 
any x value over to its corresponding y value and treat it as an dbse'^ 
tion taken from a standard normaLyari§ble , population. 




Fig. 11. Graphs of two frequency functions. 


120 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 12. The graph of an increasing function. 


One can reverse the preceding process by starting with a given frequency 
function f{x) and a given change of variable y = h{x) and ask for the 
frequency function g{y) of the new variable. If g{y) should turn out to 
be a normal variable, or approximately so, then the transformation 
y = h{x) would have accomplished the desired objective. Now it is 
relatively easy to find g{y) if the function h{x) in the transformation is an 
increasing function of x, or a decreasing function, throughout the range 
of X values. The technique for doing this, which is now demonstrated, is 
based upon finding the distribution function of y. 

It follows from formula (32), Chapter 2, that the distribution function 
of y, which is denoted by G{y\ satisfies the relations 

(40) G(t) = P{y ^ = F{hix) < t}, 

where t is any desired value. Now the inequality h{x) ;< t can be expressed 
as an inequality on x. The relationship between y and x where h{x) is an 
increasing function is like that shown in Fig. 12. For such a relationship 
there is a unique value of x to each value of y. Here the value of x cor- 
responding to the value t for y has been denoted by r; consequently, 
since h{x) < t if, and only if, x < r, 

P{hix) <t} = P{x t} = J' /(*) dx 

Thus from (40) 

G(t)=\ f{x)dx 

Now, as shown in Chapter 2, a continuous frequency function can be 
obtained by differentiating the corresponding distribution function. In view 
of the fact that r is a function of t, it follows from the calculus formula 
for differentiating an integral with respect to its upper limit that 

dG(t) dG(t) dr , . dr 

— — j [t) — 

dt dr dt dt 



THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARfABtf 121 


Since t and t were any pair of corresponding values of y and respec- 
tively, and were introduced to keep from confusing upper-limit variables 
with dummy variables of integration, this relationship may be rewritten 

dy ' dy 

But in view of the relationship between a distribution function: and its 
frequency function, the left side is the frequency function of y ; hence the 
desired formula is 

?(«/)==/(*) ^ 
dy 

If one follows through this derivation for a change of variable | = h{x) 
in which h(x) is a decreasing function, he will obtain the negative of this 
result. dxjdy will be negative in this case, a formula that is valid 

for both cases will be given by 

(41) 

Before this formula can be applied, it is necessary to replace i in/(x) 
by its value in terms of y, which means that it is necessary to solve the 
relation y = h{x) for x in terms of y. One can calculate 
inverse relationship, or else calculate dyjdx from the original rela;tionship 
y j= h{x) and take its reciprocal. 

Because of the importance and usefulness of this formula for later 
work, the result of this derivation is expressed formally. 

(42) Change of Variable Technique : If y = h(x) is an increasing or 

decreasing function and f{x) is the frequency function of x^ then g(y)^ the 
frequency function ofy.is given by the formula ^ 

S'(2/)=/(*) 

in which x is to be replaced by its value in terms of y by means of the rela- 
tion y = h{x). 

Although formula (41) will not be valid unless h{x) is either an increasing 
or a decreasing function, the procedure used to derive the forrhula can 
be applied to more complicated problems. 

As an illustration of the use of formula (41), consider the problem of 
finding the frequency function of y y ^ V x and f{x) == e™*,: x> 0. 
Since ^ = Va; is an increasing function of x, formula (41) may be applied 


dx 

dy 


dx 

dy 



122 


INTRODUCTION TO MATHEMATICAL STATISTICS 


f(x) 


g(y) 


1 2 3 

Fig. 13. Distribution of x and y = for f(x) = a? > 0. 

without the absolute value signs. The inverse relationship is x = y^; 
hence 

g(y) = e'"" Y = 

ay 

The relationship between these two frequency functions is shown geo- 
metrically in Fig. 13. Incidentally, it will be observed that g(y) is con- 
siderably more like a normal curve in appearance than is /(*). 

As a second illustration, consider the problem of finding the frequency 
function of the kinetic energy E = mv^jl, given the distribution of the 
velocity v. The frequency function of v, the velocity for a gas molecule 
with mass m, is given by 

f(v) = 

where i; > 0, Z? is a constant depending on the gas, and a is determined 
to yield unit area. Since E = mv^l2 is an increasing function of v for 
positive values of v, formula (41) may be applied by choosing x = v 
and y ^ E. Here the inverse relationship is z; = V lEjm; therefore 

2jE' 

g{E) = a — e ^ 



or 


g{E) = aE^e-^^ 

where a and ^ are constants depending upon a, b, and m. 

5.4.3.1 Chi-Square Distribution. As mentioned in the preceding sec- 
tion, the technique that was used to derive formula (41) can be used to 
obtain g(y), even though the function h(x) is not increasing or decreasing 
throughout its range of x values. This fact is illustrated by obtaining a 
frequency function, known as a chi-square function, which has many 
important uses in statistical theory and practice. 



THEORETICAL rREQl.EXCY DISTRinUTIONS OF ONE VARIAhLj; ]23 
Toward this end, consider the problem of finding when 21 = 


and /(*) = e “/v 2Tr. Here one starts with a standard normal variable x 
and wishes to find the distribution of the variable The functiph y = x^ 
is certainly not increasing or decreasing for all values of *; therefore 
formula (41) cannot be employed directly. Proceeding as in the deriva- 
tion of (41), 

P{y < 0 == ^ Vt ^ ^ V t) 

But 


G(0 


Vi 


dx 


Since this integration cannot be performed, it is necessary to differentiate 
at this stage. Thus, using the same calculus formulas as before, | 


dG(t) _ 2_ -\dV't _t h 2 

dt ■\/2'7t dt \^2'7 t ii 

The desired frequency function is now obtained by replacing t by y. Thus 

_i i; . . .. . 

■■■§■■■ 2 " ■' ■ ' 

(43) g(y) = ^- 7 ==- , y'kO 

VItt 

This function defines what is known as the chi-square distribution with 
one degree of freedom because it is a special case of a more geneiral chi- 
square distribution. The more general frequency function depenils on a 
parameter, called the hurnber of M and (43) is obtained 

by setting the parameter equal to 1. The preceding result can be stated 
by saying that the square of a standard normal variable possesses a chi- 
square distribution with one degree of freedom. This result is us^d on a 
hiimber of occasions in later chapters. 


REFERENCES T ■ '/' 

Additional material on the moment generating function may be found in A. ErAitken, 
Statistical Mathematics, 

The two theorems concerning the relationship between moment generating functions 
and frequency functions that were needed in deriving the normal approximatiph to the 
binomial distribution require additional mathematical training for their complete under- 
standing; however, they are available in E. Parzen, Mot/erw 
Applications, John Wiley and Sons. 

Discussions of other frequency functions often used as mathematical modelsmay be 
found in the preceding reference. 



124 


INTRODUCTION TO MATHEMATICAL STATISTICS 


A number of other interesting applications of the binomial and Poisson distributions 
may be found in W, Feller, An Introduction to Probability Theory and Its Applications^ 
John Wiley and Sons. 

For binomial problems in which n is sufficiently large to yield a good normal or 
Poisson approximation much computational labor may be saved by using tables giving 
sums of binomial probabilities. Such tabulated sums are available in the National 
Bureau of Standards, AMS 6, Tables of the Binomial Probability Distribution. Tables for 
sums of Poisson probabilities are available in E. C. Molina, Poisson's Exponential 
Binomial Limit, D. Van Nostrand Co. 


EXERCISES 

1 . Calculate the mean and variance for x, the face number that comes up when 
rolling an honest die. 

2. A die is loaded so that the probability of a given face turning up is pro- 
portional to the number on that face. Calculate the mean and variance for x, 
the face number showing. 

3. Calculate the mean for x, the sum of the face numbers that come up when 
rolling 2 honest dice. 

4. Calculate the mean for the distribution given in Table 2, Chapter 2, for the 
2 altered dice. 

5. ' A random variable can assume only the values 2 and 3. If its mean is 
I , find the probabilities for those 2 points. 

6. A and B match pennies. Calculate the mean and variance of x, where x is 
the amount won by A after 2 matchings. 

7. In problem 6 calculate the mean and variance if A quits after the first 
matching, provided he wins it, but B does not employ this strategy. What does 
this result say concerning A's strategy? 

8. A coin is tossed until a head appears. If a head appears on the first toss, 
the player receives $2 from the bank. If it appears for the first time on the 
second toss, he receives $4. In general, if it appears first on toss number k, he 
receives 2^ dollars. If his payment exceeds $1 ,000,000, he receives only $1 ,000,000. 
Calculate the mean amount to be won by the player of this game. What effect 
would placing no limit on the amount to be won have on the mean? 

9. If an urn contains 10 white and 5 black balls and 3 balls are drawn without 
replacement, what is the mean number of black balls that will be obtained? 
Calculate the mean by using definition (1) of the text, where x denotes the number 
of black balls obtained. 

10. Given f{x) — (i)^, x = 1, 2, 3, • • • , and zero elsewhere, find its moment 
generating function. Use it to calculate the mean and variance of x. 

11. Eight dice are rolled. Calling a 5 or 6 a success, find the probability of 
getting {a) 3 successes, (Jb) at most 3 successes. 

12. Suppose a sample of 10 is taken from a day’s output of a machine that 
normally produces 5 per cent defective parts. If the day’s production is inspected 
100 per cent whenever the sample of 10 gives 2 or more defectives, what is the 
probability that a day’s production will be inspected 100 per cent? 



THEORETICAL FREQUENCY E)ISTRIBOTlONS OF 


13. Suppose that weather records show that on the average 3 of the 30 days 
in November are rainy days. («) Assuming a binomial distribution with each day 
of November as an independent trial, find the probability that next November 
will have at most 2 rainy days. (Jb) Give reasons why you may not be Justified 
in using the binomial distribution in solving {d). 

14. In calculating binomial probabilities, it is convenient to calculate /(a; + 1) 

from /(^) by the formula /(a; + 1) = k{x) f {x)^ where kix) = g Show 

X \ q 

that this formula is correct. 


15. If ic has the frequency function f(x) = xjl^ 0 a? 2, calculate the 
probability that (a) both of 2 sample values will exceed 1 and (jb) exactly 2 of 4 
sample values will exceed 1 . 

16. If a; has the frequency function /(a?) = 1, 0 a; ^1, (a) whaji is the 
probability that at least 2 of 3 sample values will exceed .6? (b) What vqlue of a? 
is such that the probability is | that at least 2 of 3 sample values will exceed it? 

17. Given that a binomial variable has mean 12 and variance 8, find p 
and n. 


18. Experience shows that 10 per cent of the individuals reserving tables at a 
night club will not appear. If the night club has 50 tables and takes 53 reserva- 
tions, what is the probability that it will be able to accommodate everyone 
appearing? 

19. In the world serie for baseball, the series is cohclu^^^ 1 team has 


won 4 games. Let p be the probability of team A winning a single game and 
assume that this probability remains constant in the series. Show that the 
probabilities of the series ending in 4, 5, 6, or 7 games are .125, .25, .31 125, and 
.3125, respectively, when p = and .21, .30, .27, and .22, respectively, when 


, , 

20, Use the Poisson approximation to calculate the probability of getting 9 
successes in 1000 trials of an experiment for which ^ = .01. 

21. Use the Poisson approximation to calculate the probability that af most 1 
person in 500 will have a birthday on Christmas. Assume 365 days in the 


year, 

22. Assume that the number of particles emitted from a radioactive source 
follows a Poisson distribution with an average emission of 1 particle per second. 
(a) Find the probability that at most 1 particle will be emitted in 3 seconds. (6) 
How low an emission rate would be necessary before the probability of getting 
at most 1 emission in 3 seconds would be at least ,80? 

23. Assume that the number of items of a certain kind purchased in a store 
during a week’s time follows a Poisson distribution with = 100. Hoiw large 
a stock should the merchant have on hand to yield a probability of .99 that he 
will be able to supply the demand ? 

24. Suppose the number of telephone calls an operator receives from P: 00 to 
9:05 follows a Poisson distribution with /Ls= 3. (a) Find the probability that 
the operator will receive no calls in that time interval tomorrow. (6) pind the 
probability that in the next 3 days the operator will receive a total of 1 call in 
that time interval. 



126 INTRODUCTION TO MATHEMATICAL STATISTICS 

25. Solve problem 13, using the Poisson approximation to the binomial 
distribution and compare answers to see how good the approximation is. 

26. Assume that customers enter a store at the rate of 120 persons per hour. 
{a) What is the probability that during a 2-minute interval no one will enter the 
store? {b) What time interval is such that the probability is \ that no one will 
enter the store during that interval ? 

27. {a) Given that x possesses a Poisson distribution with mean //, show that 

the moment generating function of x is given by M^{0) = gy differen- 
tiating verify that the mean is ^ and show that the variance is also equal 

to fjL. 

28. Show that the Poisson probabilities increase and then decrease unless 
fx <\. Determine what value of x (function of jx) has maximum probability. 
Consider the ratio of neighboring probabilities. 

29. What is the probability that one will arrive at a red signal at an inter- 
section if one’s time of arrival is by chance and the signal alternates from 20 
seconds of green to 40 seconds of red ? 

30. Two students agree to meet at a restaurant between 6 and 7 p.m. Find the 
probability that they will meet if each agrees to wait 10 minutes for the other 
and they arrive independently at random times between 6 and 7. 

31. Three points are chosen by chance on the circumference of a circle. 
What is the probability that they will all lie on a semicircle? 

32. Let t denote the life of a radio tube in hours with frequency function 
f{t) = r > 0. If a = for how many hours of life should the manu- 
facturer guarantee his tubes if he wants the probability to be .90 that a tube will 
satisfy the guarantee? 

33. A random variable has the frequency function/(ic) = a + bx^, 0 < cc 1 . 
Determine a and b so that its mean will be §. 

34. f(x) = \, 0 < X < \, find (a) the mean and variance of x and (b) the 
mean and variance of x^. 

35. Given that f(x) ^ cx, 0 < x < I, find (a) c, (b) fx^' by integration, (c) 
WX (t^)^/ fromM,(0). 

36. Given that f{x) = ce~^, x > 0^ find (a) c, (b) (c) from MJfi). 

yi. Given \hdX f{x) — cx^e~^, ^ 0, a a positive integer, find {a) c, using the 

p 00 

fact that = a ! for a a positive integer, {b) ix^ from definition, (c) 

(^)^;,'from 

38. Assume that the length of telephone conversations x has the frequency 

function /(re) = Show that the probability of a conversation lasting more 

than + ^2 minutes, given that it has already lasted at least minutes, is equal 
to the unconditional probability that it will last more than minutes. 

39. Find the moment generating function for the triangular distribution whose 
frequency function is given by /(a?) = a;, 0 < a; < l,/(a;) = 2 — a?, 1 < a; ^ 2. 

40. If X is normally distributed with /< = 1 and find {q) P{x > 2} and 

ib)P{0<x<\}. 

41. If a; is normally distributed with /u — 1 and tr = 2, find a number Xq such 

that (a) P{x > = .10 and {b) P{x > — a;J = .20. 



THEORETICAL FREQUENCY mSTRIBOTrOKS^^^^OT 127 

42. Assume that the life in hours of a radio tube is normally distributed with 
mean 200 hours. If a purchaser requires at least 90 per cent of them to have lives 
exceeding 150 hours, what is the largest value that or can have and still Have the 
purchaser satisfied ? 

43. Assume that height of adult males is normally distributed with = 69 
inches and a = 3 inches. What is the conditional probability that an individual 
will be taller than 72 inches if it is known that he is taller than 70 inches ? 

44. Fit a normal curve to the histogram for the data of problem 3, Chapter 4. 

45. FindiM;fc for the normal distribution by using the integrai definition and 
repeated integration by parts. 

46. A coin is tossed 12 times. Find the probability, both exactly and by the 
normal curve approximation, of getting (a) 4 heads and (6) at most 4 
heads. 

47. A die is tossed 12 times. Counting a 5 or 6 as a success, what is the 
probability, using the normal curve approximation, of getting (a) 4 successes and 
(6) at most 4 successes ? 

48. Given /? =: .4 and « == 1350 for the binomial variable a?, use the normal 
approximation to calculate («) P{510 < x ^ 560}, (Z?) P{x < 580}. 

49. A die is tossed 90 times. Find the probability of getting 15 aces (4) using 
the binomial formula and tables of factorials and (5) using 
approximation. 

50. Find a number such that the probability of getting a number oi* heads 
between 500 — a^^d 500 + Xq, inclusive, in 1000 tosses of a 

51. A coin is tossed 400 times. Would 215 heads be cohsider^^^ reasonable 

result? . 

52. Experience shows that 20 per cent of a certain kind of seed germinates. 

If 50 of 400 seeds germinated, would y^^ hypothesis that /?'== .20? 

53. About 9 per cent of the population of the country is between 20 and 24 

years of age. A city of 12,000 has 1300 in this age group ' Test to see if itiis city 
is typical with respect to this age group. ■ 

54. A manufacturer has found from experience that 3 per cent of his product 

is rejected because of flaws. A new lot of 800 units cornes for inspection. 
(a) How many units would reasonably be expected to be rejected? (h) What is 
the approximate probability that less than 30 units will be rejected ? !: 

55. A manufacturer of cotter pins knows that 5 per cent of his prqduct is 
defective. If he sells cotter pins in boxes of 100 and guarantees that not more 
than 10 pins will be defective, what is the approximate probability that a Hox will 
fail to meet the guaranteed quality ? 

56. Suppose tharyou wish ^ m a control chart for the proportion p' 

of words incorrectly typed by a stenographer per hour. If she typed 120(5 words 
an hour, on the average, for 6 hours a day, for 10 days, and she mistyped 360 
words in that total period of time, what 2 numbers would you use for boundaries 
for the control chart? ;■ 

57. In the manufacturing of parts, the following data were obtained for the 

daily percentage defective for a production averaging 1000 parts a day!' Con- 
struct a control chart and indicate times wHe^^ was out of control. 



128 INTRODUCTION TO MATHEMATICAL STATISTICS 

The data are to be read a row at a time. 


12 

2.3 

2.1 

1.7 

3.8 

2,5 

2.0 

1.6 

1.4 

2.6 

1.5 

2.8 

2.9 

2.6 

2.5 

2,6 

3.2 

4.6 

3.3 

3.0 

3.1 

4.3 

1.8 

2.6 

2.1 

2,2 

1.8 

2.4 

2.4 

1.6 

1.7 

1.6 

2.8 

3.2 

1.8 

2.6 

3.6 

4.2 




58. A sample is to be taken in a city to estimate the percentage of families 
willing to pay 200 dollars for a home freezer. It is desired to have, with a 
probability of .95, an estimate correct to within per cent absolute. Tentatively, 
it is estimated that the true percentage is near 20 per cent. How large a sample 
will be required? 

59. If you wished to estimate the proportion of Republicans in a certain 
district and wanted your estimate to be correct within .02 unit with a probability 
of .90, how large a sample should you take (a) if you know that the true pro- 
portion is near .4, (6) if you have no idea what the true proportion is? 

60. (a) If you rolled a die 240 times and obtained 50 sixes, would you decide 
the die was biased in favor of sixes? (b) If you repeated the experiment and 
obtained 48 sixes, would you conclude that the second experiment justified your 
decision in (a) or would you conclude differently ? 

61. Assume that telephone calls coming into a switchboard follow a Poisson 
distribution at the rate of 15 calls per minute. If the switchboard can handle at 
most 25 calls per minute, what is the probability that in one minute the switch- 
board will be overloaded? Use a normal approximation in your calculations 
based on the results of problem 27. 

62. Assuming that the number of white blood cells per unit of volume of 
diluted blood counted under a microscope follows a Poisson distribution with 
jw = 121, what is the probability, using a normal approximation, that a count 
of 100 or less will be observed? Use the results of problem 27. 

63. If the number of telephone calls coming in to a given switchboard during 
a period of a minute follows a Poisson distribution with jn = 10 and the switch- 
board can handle at most 20 calls per minute, (a) what is the probability, using a 
normal approximation, that during the next minute the switchboard will be 
overtaxed? (b) What is the approximate probability that it will not be overtaxed 
during an hour’s service if the numbers of calls in consecutive minutes are 
assumed to be independently distributed? 

64. For « = 12 and p = h plot on the same piece of graph paper (a) the 
binomial histogram, (b) the Poisson histogram, and (c) the fitted normal curve 
by ordinates. Note the extent to which (b) and (c) approximate the binomial. 

65. A source of liquid is known to contain bacteria with the mean number of 
bacteria per cubic centimeter equal to 3. Ten 1 -cubic centimeter test tubes are 
filled with the liquid. Assuming the Poisson distribution is applicable, calculate 
the probability (a) that all 10 test tubes will show growth, that is, contain at least 
1 bacterium each and (b) that exactly 7 test tubes will show growth. 

66. If f{x) = cxe ^ ,x > 0, find {a) c, {b) the mean of x, and (c) the variance 

of X. 



THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABIB'; 129 

67. In firing at a target assume that the horizontal distance a shot hits from 

the center line is normally distributed with a =» 2 feet. («) In 200 shots how many 
would be expected to miss the target if it is 10 feet wide arid sufficiently high? 
(6) How many shots would you need to fire to be certain with a proba!bility of 
.95 of getting 50 or more shots within 3 feet of the centerline ? ;; 

68 . Fit a Poisson function to the following ‘ ‘famous” data on the number of 
deaths from the kick of a horse per army corps per year, for 10 Prussian Army 
Corps for 20 years. The total number of units here, an army-corps year’ is 200. 


x 

0 

12 3 4 

f 

109 

65 22 3 1 


69. Fit a binomial function to the followmg data on the number pf seeds 
germinating among 10 seeds on damp filter paper for 80 sets of seeds. 


X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

f 

6 

20 

28 

12 

8 

6 

0 

0 

0 

0 

0 


70. A sample of 2 is taken from a box of 10 articles. If 4 of the articles are 
defective, what is the probability of getting no defectives in the sampled 

71 . In considering a lot of 100 items, a purchaser agrees to buy if a sample of 
10 shows at most 1 defective. If the lot actually contains 10 per cent defectives, 
what is the probability that the purchase will be made? Conipare yotir^^m 
with that based on a binomial approximation. 

72. A box contains 100 items of which 5 are defective. Let ir denote the 
number of defectives found in a sample of 10. («) Calculate the probability that 
ir = 2. ih) Use the binomial approximation to make the calculation, (cj Use the 
Poisson approximation to make the calculation. 

73. A bag contains 2 red, 3 green, and 4 black balls. If 4 balls are Irawn in 
succession with replacement each time, what is the probability of getting 1 red, 
1 green, and 2 black balls ? 

74. Work problem 73 if there are no replacements of the drawn balK. 

75. Calculate the probability of getting a total of 6 if 3 dice arp thrown 
simultaneously. 

76. A game of chance consists in tossing a ball into boxes numbered 1, 2, 3, 
and 4. If the probabilities for landing in these boxes are -gA”? 2T» and 2^5-, 
respectively, and one receives in dollars the number on the box, what is the 
probability of winning at least 5 dollars when taking 2 tosses? 

77. What is the probability in 12 rolls of a die that each side will gome up 
twice? Show that no other possible result has a higher probability of bScufririg! 

78. Given /(a?) = x >0, find, by the change of variable technlique, the 
frequency function of the variable (a) y = Ijx and (b) y = logg x. 

■ is 

79. Given /(a?) — xe a? > 0, find the frequency function of the variable 
(a) 2 / = a? + 1, (b) y = a;2, (c) y = log^ a?. 









i30 


INTRODUCTION TO MATHEMATICAL STATISTICS 


80. Given that d is uniformly distributed over the interval —nil to n/l, find 
the frequency function of z =: A sin 0, where ^ is a constant. 

81. Let a? be a standard normal variable. Find the frequency function of 
(a) 2x + 1, (b) 2x^ + 1. 

82. Given that x is uniformly distributed over the interval —1 to 1, find the 
frequency function of (a) x^, (6) — log^ |a;|. 

83. Given f(x) =2(1 — x),0 < x < I, find the distribution of » == x^. 

84. Given that x has the continuous distribution function F(x), find expressions 

for the distribution functions of the variables (a) y = (b) y = log x, (c) 

y = F(x), 

85. A variable x is said to have a log-normal distribution if 2 / = log^ x is 
normally distributed. Given that the mean and variance of y are 0 and 1, 
respectively, find the frequency function of x. 

86. Show that the Cauchy distribution given by (9), Chapter 4, does not 
possess a mean. 

87. Show that the mean of the hypergeometric distribution h y = np by 

employing the relation ~ 

88. An experiment is to be conducted 100 times to determine whether a 
possible outcome has probability p = A ov p > ,4. If x denotes the number of 
outcomes and if a; > 48 is chosen as the critical region for testing H^: p — .4, 
find an expression for the power function of the test. 

89. In problem 88 use the normal approximation to find an expression for the 
power function. 

90. A box containing 100 items has an unknown but small proportion p of 
defectives. If x denotes the number of defectives in a sample of 10 and one is 
interested in testing the hypothesis p = against p < by using x <^2 
as a critical region, find an expression for the power function of the test. 

91. Assume that the number of persons per minute buying ferry tickets is 10. 
Find an expression for the probability that at least t minutes will elapse before 
50 tickets will be sold. 




C H A P T E R 6 


Elementary Sampling Theory for One 
Variable 


In Chapter 5 a beginning was made in testing hypotheses an^ esti- 
mating parameters; however, the problems considered there were mostly 
concerned with the binomial distribution. In the present chapter this 
beginning is extended to other distributions, particularly to continuous 
distributions. Only a few of the simpler problems are considered! here; 
the more complicated problems will be studied in Chapter 1 1. 


6.1 Random Sampling 

In the applications of the binomial distribution in the preceding cHapter 
it was pointed out that the binomiSl mbdel is ffiictly vHid only if the trials 
of the experiment are independent and p is constant from trial to trial. 
In the language of sampling, this means that samples must be obtained by 
a method that possesses these two properties. 

The theory that is about to be developed for continuous variables is 
based on assumptions very similar to those used to derive the binomial 
distribution. The first assumption is that the successive trials of the 
experiment are independent and the second is that the frequency function 
of the random variable remains the same from trial to trffl^^ If the theory 
is to be applicable to real experimental data, it is necessary that tHc data 
be obtained by a sampling method that possesses these two properties. 
In order to express these properties in a mathematical form, consicjer the 
following notation and procedure. 

Let f{x) be the frequency function of the continuous random variable 
X and let a sample of size n be drawn. The resulting sample values are 
denoted by x-^, x^, • • • , x^. If a second sample of size w were dra\yn, the 
resulting sample values would be denoted by x^\ x^, • • • , x^\, and 

ni ■ 



132 INTRODUCTION TO MATHEMATICAL STATISTICS 

similarly for additional samples. These values are conveniently arranged 
as follows: 


Xi, 



*^1 j 


* » 

•^1 ? 

* • 

5 n 


Now consider the values in the first column. These values may be treated 
as the values of a random variable with a frequency function 
In the same manner the values in the second column may be treated as 
the values of a random variable with frequency function / 2 (a:^ 2 ) and 
similarly for the remaining columns. 

In this notation the requirement that the frequency function of the 
random variable x shall remain constant from trial to trial means that 
the random variables x^, * ^x^ must possess the original frequency 

function, that is, that 

/l(*) =/2(*) = • • • =/«(»;) =/(*) 

In this same notation the requirement that the trials shall be independent 
means that the variables x^, i*?2» ‘ ‘ j must be independent. A method 
of sampling that possesses these two properties is called random sampling. 
In view of formula (24), Chapter 2, and the preceding discussion, random 
sampling may be defined mathematically in the following manner. 

(1) Definition: Random sampling is a method of sampling for which 

f(Xi, ajg, • • • , xj = f{x^f{x^ • • •/(«„) 

where f{x) is the frequency function of the population being sampled and 
where x^, ^*^ 2 ’ ‘ random variables corresponding to the n trials of 

the sample. 

Although the variable x in the preceding discussion was treated as a con- 
tinuous variable, definition (1) applies to both continuous and discrete 
variables. 

As an illustration of a continuous random variable for which the 
sampling method approximates random sampling, let x be the distance 
the end of a spinning pointer is from the 0 point, as measured along the 
circumference, after it comes to rest. Figure 1 indicates the nature of this 
variable. If a sample of size 5 were desired, the pointer would be spun 
five times and the distances recorded. Now, if a pointer is spun repeatedly. 



ELEMENTARY SAMPLING THEORY j 33 



Fig. 1 . A game of chance. 


and the resulting values of cc are marked off into consecutive sets of five, 
it will usually be found that the empirical distributions of the vfeables 
^ 1 , ’’’ 9 iTg will approach the rectangular distribution /(a:) = 1/t?, where c 
is the circumference* it will also be found that tests of independence, 
which will be studied later, usually substantiate independence of trials 
here. 

It should be noted that definition (1) defines a method of sampling and 
says nothing about particular samples. It is legitimate to call a sample a 
random sample only if it has been obtained by a random sampling rhethod. 

It is frequently not feasible to check many real life sampling method 
for randomness because of the expense or difficulty of obtaining enough 
data to test the properties in definition (1); Then one must rely cm judg- 
ment and experience to determine whether the method is sufflciently fandom 
to permit the use of models derived 5h the basis of random sampling. 


6.2 Moments of Muitiyariate Distributions 

Since random sampling involves the frejquency function / (a^i, itJgj ^ ^ * > ^nX 
it is necessary to study properties of this function. In particular, it is 
necbssaty to define moments and the moment generating function for 
multivariate functions. The moment notation that was introduced in 
Qiapter 5 becomes quite cumbersome when it is applied to multivariate 
situations. Furthermore, it lacks flexibility in deriving formulas!; con- 
sequently, a new type of notation involving an expected value symbol E 
is introduced here. This notation is presented first for a single fandom 
variable in order to show its relationship to earlier material, after which it 
is generalized to multivariate functions. 

If the continuous random variable cr has the frequency function /(a;), 
its expected value is defined as ;■ 

(2) £[a;] = xf{x)dx 



134 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Thus the expected value of a random variable is its mean value. More 
generally, if is any function of x, the expected value of h(x) is defined 
as 

(3) Kx)f(x)dx 

Moments and moment generating functions, as defined in Chapter 5, 
can be expressed as expected values. For example, the kth moment of x 
is obtained from (3) by choosing h(x) = x^ and therefore is given by 



Similarly, the moment generating function of x is given by 

= p e^^fix) dx 
• 1-00 

Corresponding expressions that are valid for a discrete random variable 
can be obtained by merely replacing the preceding integrals by sums. 

Since the expected value symbol E was designed to produce mean 
values, the question naturally arises whether the expected value of a func- 
tion of X, say h{x), as given by (3), is really the mean value of that function. 
To show that this is so, first let y = h(x) in order to simplify the notation. 
Since 2/ is a random variable with a frequency function, say g{y\ it follows 
from (2) that its mean value is certainly given by the integral 

(4) Ely] = f y g(y) dy 

J — 00 

Now if h(x) is an increasing function of x, the change of variable technique 
given in (42), Chapter 5, may be employed to yield the relation 

giy) dy = fix) dx 

As a consequence, (4) may be written in the form 

Ely] = f yfi^) dx 

■ \ ‘^ — 00 

Since y = h(x\ this result is equivalent to (3). Although the equivalence 
of (3) and (4) was shown for h{x) an increasing function, the equivalence 
holds quite generally. The advantage of evaluating the integral given 
in (3) rather than in (4) to find the mean value of h{x) lies in the fact that (3) 
does not require one first to find the frequency function of h{x). When 
h{x) is not an increasing, or decreasing, function of x, this may require 
considprable effort. 



elementary SAMSUNG theory EOR ONE VARIABLE " 135 

Now consider the generalization of these definitions to multivariate 
functions. In this connection let gixi, oc^y • • • , x„) be any function of the 
random variables x^, • • ■ , whose frequency function isfix^i^x^, • • • , 

x„). Then the expected value of g(xp x^, • • • , x„) is defined by 1 

^00 ^00 

(5) £[g] = • • • g(Xi, *2, • • • , *„)/(*!, a^a, • • • , a;„) dx-^ dx^ ■••dx^ 

V- QQ ■ 00 

The variables of which g is a function have been omitted on the left 
side for notational convenience/ Just as for one variable, it is possible 
to demonstrate that the value given by (5) is the mean value Of g and 
therefore is the same as that obtained by finding the frequency function 
of g and applying the elementary definition (2) to the random vajriable g. 

The particular quantities that are needed in this chapter are the fcth 
moment of % * • • , and the moment generating function of 
g (%5 % • • * , a?„). In terms of expected values, the A:th moment of 
g(a?j, iTg, ; * • , a; J is defined as 

f*oo poo 

(6) £[g*G= ••• g^(jev^2>'">^n)fi^V^2,‘",^n)dXidX2^--dx, 

V — on •/ — OD 

Corresponding to this definition, the moment generating function of 
g(^i, ’ j ^n) is defined as 

(7) ■ ■ ■" ' '■ ^ ; , . T. ... 

M/e) = £[e*»] = r •••[’” x^,-'-, K) dx, dx,--- dx„ 

. - CO *'-00 ■ ii . . .. .. 

That (7) generates moments in the same manner as (21), Chapter 5, is 
easily verified by expanding and integrating term by term. 

Since expected value methods are used to assist in the development of 
the theory in this chapter, three of the most useful properties of the ex- 
pected value symbol B are derived next 

6,3 Properties ofE 

If c is any constant, it follows directly from (5), after factoring out c 
from the integral on the right side, that 

(8) E[cg] ^ cE[g] 

Next, since the integral of a sum is equal, to the sum of the integrals, 
it follows from (5) that 

(9) ^[gi + ^al = ^fel + W I 

where gi and g 9 . are any two functions of a set of random variables. 



136 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Finally, if and gg independently distributed, which means that 
their joint frequency function can be factored into the product of the two 
individual frequency functions, one can write 

( 10 ) £[^ 1 ^ 2 ] = f f gigifiiSi)Ai 82 ) dgi dg2 

Here and are the frequency functions of the random variables and 
respectively. The formulation in (10) is in terms of the random vari- 
ables and g 2 themselves and not in terms of the basic random variables 
‘ ill (5). But the double integral in (10) can be written in 

the product form 

^00 ^00 

gifiiii) dgi g^h{g2) dgi 

V — 00 *' — 00 

Since this is the product of the individual expected values, it follows that 
when gi and gg are independently distributed 

(11) E[g^g^]=^E[g^]E[g^] 

It should be noted that the expected value of a sum of random variables 
is equal to the sum of their expected values, whether or not the variables 
are independently distributed, whereas the expected value of a product 
need not be equal to the product of the expected values unless the variables 
are independently distributed. 

As an illustration of expected value techniques, which at the same 
time will illustrate hpw useful such niethpds ^^a^^ consider the problem of 
finding the mean and variance of a sum of independent variables. 

Let Xi, ajg, • • • , be a set of independent variables with means ^ 1 , 
‘ j and variances • • • , and let 

s = + 0^2 H 

From formula (9) it follows that 

(12) = E[z] = X E[Xf2 = X f^i 

i=l i=l 

Next, write 

2 - = (*1 - + (*2 - i«2) H 1- (»=« - /««) 

Then 

n n 

(z - =2 X - ^“<)(*^ - i“^) 

Application of formula (9) will give 

(13) £(z - =22 E(Xi - - iXj) 

i = 1 i = r 



ELEMENTARy SAMRlNG THEORY FOR ONE VARIABLE I! l37 

The bracket notation for expected values is usually omitted, as ii is here, 
when it becomes cumbersome and no confusion results from dping so. 
Now, since the variables and are independent random variables 
when i ^ y, formula (11) may be applied to give 

E(Xi - - (i^) = EipCi - (i^)E{xj - fij), 

But E{Xi — = 0; therefore (13) reduces to 

E(z - = X E(Xi - fi;f 

Since E{Xi — this result can be expressed as 

(14) 

i=l 

Formula (12) states that the mean of a sum of random variables is 
equal to the sum of the means of the variables. The derivation: of that 
formula did not require the independence of the variables; therefore, as 
stated before, the formula holds regardless of this assumption. Formula 
(14) states that the variance of a sum of independent random variables is 
equal to the sum of the variances of the variables. 

As a particular application of these two formulas, consider the problem 
of finding the mean and variance of a binomial variable. This problem 
W£ts solved in 5.2.3.2 by a direct application of moment de%itions. 

Let x^, x£ - * • .x^ he binomial variables corresponding to n independ- 
ent trials of an experiment for which y? is the probability of success in a 
single trial. Thus =5= 1 if a success occurs and Xi = 0 if a failure, occurs 
on the fth trial. Next, let ii 

2 = 0^1 + X^ + •• • + X^ 

In view of the definition of x^, it follows that 2 ; is equal to the total pumber 
of successes in the n trials of the experiment; consequently, the problem 
is to find the mean and variance of 2 :. 

For discrete variables definition (2) must be replaced by 

£[*] = 2 Xf (x) i’ 

135 = 0 

Since the variable x^ can assume the values 0 and 1 only with probabilities 
q and /?, respectively, it follows from this last formula that 

E[x^\—0’q +\-p=p 

This result, when applied to (12) gives 


= np 



138 


INTRODUCTION TO MATHEMATICAL STATISTICS 


The technique used to calculate £[*,] may be applied to obtain the value 
of E(Xi - fiif = E(x^ — pf. Thus 

u/ = Eix, - + -pf .p =pq 

As a result, formula (14) gives 

of = npq 

These two results, of course, agree with formulas (12), Chapter 5. 


6A Sum of Independent Variables 

A very useful formula for developing theory about sample means can 
be obtained when the variables * • • , are independent and when 
g{x^, (^ 2 , • • • , x^ is the special linear function 

• • • , ar J = + i»2 H + 

The moment generating function of this sum is 

= • • • e®*"] 

But because of the independence of these exponential functions, it follows 
from (1 1) that 

.H, je) = • • • Eie^^^-] 

Since this result is used so often, it is stated in the form of a theorem. 

Theorem 1 : The moment generating function of the sum of n independent 
variables is equal to the product of the moment generating functions of the 
individual variables^ that is, 

. .+ 40 ) = M,X0)M4e) • • • M^B) 


6,5 Distribution of x from a Normal Distribution 

In this section the distribution of a sample mean based on a random 
sample of size n from a normal population is derived. 

Let X be normally distributed with mean and standard deviation cr. 
Consider a random sample of size n from this normal population. The 
mean of such a sample, 

^ ~ (^1 + ^2 + * “ + ^ n ) 

n 



ELEMENTARY SAMPLING THEORY FOR ^ 

will be a random variable because the variables ^ \ corre- 

sponding to the n trials of the sample are random variables, -^fter a 
particular random sample has been taken, x will be a number, but before 
it has been drawn, it will be a random variable capable of assuming any 
value that the original variable x can assume. For the purpose of finding 
the frequency function of x, consider its moment generating function. 
If the first formula given in (22), Chapter 5, is used, it will follow that 

M*(0) = 

Since the sampling is random, the variables x^, are independent, 

and therefore Theorem 1 may bripplied to give 

■■"-0 

But random sampling as given by definition (1) also implies that all 
the variables • yX^ have the same frequency function, namely 

that of a;, hence the same moment generating function. Consequently, all 
the M’s on the right are the same function, namely, the moment generating 
function of the variable a?. Thus 

(15) M^(0) = M/0 


Now, from formula (28), Chapter 5, it is known that if x is njormally 
distributed 


(16) 


MM 


e 


If this result, with 6 replaced by djn, is used in (15), that formula will 
reduce to 


MM = 





Since the function on the right, when compared with (16), is seen to be 
the moment generating function of a normal variable with nieah fi and 
standard deviation ajV n and since a moment generating function uniquely 
determines a frequency function, this result proves the following theorem. 

Theorem 2 : If x is normally distributed with mean (a and standard 
deviation a and a random sample of size n is drawn, then the sample mean 
X will be normally distributed with mean fx and standard deviation ajV n. 

This theorem shows how the precision of a sample mean for estimating 
the population mean increases as the sample size is increased^ Since 



140 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 2. Normal distributions of x and x for n = 10. 

the standard deviation of x measures the variation of sample ^’s about ^ 
and may be treated as a measure of the precision of estimating ^ by 
means of x, it is clear from the theorem that it is necessary to take four 
times as large a sample if one wishes to double the precision of an estimate 
at hand. Figure 2 shows the graph of a normal distribution with = 3 
and O’ = 1, together with the graph of the distribution of ^ for samples of 
size 10 drawn from it. 


6.5.1 Applications 

As an illustration of the application of the theorem, consider the 
following problem. A manufacturer of string has found from past exper- 
ience that samples of a certain type of string have a mean breaking 
strength of 15.6 pounds and a standard deviation of 2.2 pounds. A time- 
saving change in the manufacturing process of this string is tried. A 
sample of 50 pieces is then taken, for which the mean breaking strength 
turns out to be 14.5 pounds. On the basis of this sample can it be con- 
cluded that the new process has had a harmful effect on the strength of 
the string? Now, experience indicates that the breaking strength of string 
is approximately normally distributed. Hence it will be assumed that 
the breaking strength x is normally distributed with // = 1 5.6 and a = 2.2. 
This problem can be treated as the problem of testing the hypothesis 

against the alternative hypothesis 

15.6 


ELEMENTARY SAMPLING ■xheORY 141 


In this form one is testing the hypothesis that no change has occurred in 
the mean breaking strength against the possibility that the mean has been 
lowered. If the sample is treated as a random sample of s^i^^ 50 Irdm the 
original normal population, then, by Theorem 2, x will be normally dis- 
tributed with 


iu^=^15.6 and 


2.2 

V50 



The value of ir for this one sample of 50 is 14.5; hence the corresponding 
value of Ms 

t = _3.55 

*31 


From Table II the probability of obtaining a value of f —3. 55, ' hence a 
value of X ^ 14.5, is only about .0002. Since this probability is much 
smaller than the probability of .05 being used as the size of the critical 
region for testing hypotheses, the value 14.5 is certainly. significant and 
accordingly the hypothesis is rejected. ;■ 

By the methods of Chapter 9 it can be shown that the choice of the left 
tail of the x distribution for the critical region is the best possible choice 
from the point of view of Chapter 3. 

In this problem, as with most applied problems, it is necessary to con- 
sider the reasonableness of the model being used. The normality assump- 
tion is usually difficult to verify, unless one has a large amount of data; 
however, it will be seen very shortly that little harm usually results from 
a: hot being normally distributed^ The assumption that thh iam|^e of 5d 
is a random sample from the production process is a more serious assump- 
tion, partly because the effect of nonrandomness on the theory is unknown 
and partly because data obtained from an industrial production line are 
seldom random. Defective items tend to cohie in groups because of a 
poor batch of material or similar causes. If the randomness assuhiption 
in the preceding problem is reasonable, then the practical implications of 
this test are that the mean breaking strength has dropped. Whethef the 
drop is sufficiently great to give concern is outside the scope of the present 
discussion. 

For problems of the type just considered it is rather common in applied 
statistics to call a/V n the standard error of the mean. The name standard 
error is also used in connection with random variables other than t% mean, 
being always the same as the standard deviation of that random variable. 
The expression probable error is also fairly common in some circles. It 
is related to the standard error by means of the approximate formula 
P.E. = .6745 S.E. For a normal variable x the probability is J that a; will 



142 


INTRODUCTION TO MATHEMATICAL STATISTICS 


fall in the interval ^ ± P.E. Since it is more convenient to work with 
standard deviations than with probable errors, the use of the probable 
error is being abandoned. 

As another illustration, consider the following problem. Since the 
mean in the preceding problem was changed by the change in the produc- 
tion process, the question how accurate the sample mean is as an estimate 
of the new mean arises. As before, assume that x is normally distributed 
with standard deviation 2.2 but with unknown mean ju. Then x will be 
normally distributed with mean and standard deviation .31; conse- 
quently, the probability is .95 that x will not deviate from [jl by more than 
.62 unit because this deviation corresponds to two standard deviations. 
Thus one can feel quite certain that the sample mean x = 14.5 does not 
differ from the true mean by more than .62 pound. 

As a final illustration, consider the problem of determining how large 
an additional sample will be needed if one wishes to estimate the true 
mean in the preceding problem to within \ pound. As in the preceding 
problem, it is assumed that x is normally distributed and that the standard 
deviation was not affected by the change in production methods. For a 
sample of size n, x will then be normally distributed with mean and 
standard deviation 2.2/V n. If, as in the last illustration of 5.3,4.5, one 
wishes the maximum error to be exceeded only 5 per cent of the time, then 
it is necessary that the maximum error of | correspond to two standard 
deviations of x. Therefore n must be such that 

1 = 2a* 

2 

This is equivalent to 

\ la ^ 2(2.2) 

2 yjn 

The solution of this equation to the nearest integer is « = 77. Since a 
sample of size 50 is already available, only 27 additional observations 
should be necessary. 

It should be noted that the population standard deviation was as- 
sumed known in these problems. In most problems the population 
standard deviation is not known. Then the sample estimate of the stand- 
ard deviation is often used in place of the unknown population value; 
however, this procedure introduces an error. The error is not serious for 
large samples, but for small samples a more refined procedure which does 
not require such approximations is necessary. Such methods will be 
studied in Chapter 11. 



ELEMENTARY SAMPLING THEORY I^R ONE VARIABLE;’ 143 

6.6 Distribution of jc from Non-normal 

Since many variables of interest possess distributions that are not even 
approximately normal, it is important to know to what extent tlife theory 
developed on the basis of assuming normality holds for other distribu- 
tions. Here it is assumed that x is not nornially distributed liut does 
possess a distribution for which the moment generating function exists. 
Then it is shown that the distribution of ^ approaches a normal distribu- 
tion as the size of the sample increases. 

Just as in the proof that the distribution of a binomial variable ap- 
proaches that of a normal variable as n -^ oo, it is necessary to work with 
standard variables, that is, variables with zero means and unit variances. 
It is therefore necessary to find the mean and variance of ^ for hon-nor- 
mal variables before proceeding with the proof. This is done by means of 
the expected value operator E. 

From properties (8) and (9) it follows that 

£[*] = e\- (*1 + 3^2 + • • • + = - i " 

Ln J ni=i 

But since the sampling is random, Elx^] = E[x] = consequently, 

£[*] = - 1 =/< 

n i=i 

This shows that the mean of ^ is the same as the mean of whether x is 
a normal variable or not 

Since nx = Xj^ + X2 + ••' + x^ is the sum of n independent variables, 
all of which haYe the same varian^^ oi‘^ it follows from formula (14) that 

But the variance of a constant times a yariable is equal to the square of 
the constant times the variance of the yariable; therefore 

= . J, 

Equating these two results yields the formula 



n 


This demonstrates that the yariance of J for a nonrnQrmai yariable is 
the same as for x 3, normal yariable. It is assumed, of co the 

variable x possesses first and second moments. 



144 INTRODUCTION TO MATHEMATICAL STATISTICS 

In view of the preceding results, it follows that the variable t = 
(x — nja is a standard variable. This is the variable whose distribu- 
tion will be shown to approach that of a standard normal variable; there- 
fore, consider next the moment generating function of this variable. If 
properties (22), Chapter 5, and formula (15) are applied, it follows that 



Taking logarithms of both sides to the base e gives 

log MiiO) = - log 

a \ayjnj 

After replacing M^idjaV n) by its expanded form, as given by (5), Chapter 
5, it follows that 

log M,(0) = - ^ + n log (l + //i' -^ + ^ 2 ' + • • • ) 

o \ Oyjn 2a^n J 

If n is chosen sufficiently large, the logarithm on the right may be treated 
as of the form log [1 -f- z\ with \z\ < 1 and then expanded in the same 
manner as in (32), Chapter 5; hence 



+ terms in 0^ fc > 3 

Since jui = fx and 

log MX0) = — + terms in 0^ k > 3 


(17) 



ELEMENTARY SAMPLING THEORY FOR 0?i}E VARIABLE ' 145 

From an inspection of terms in it will be seen that the only function 

- 5+1 

of n they contain is the factor n ^ . Since k ^ 3, all such ttsrms will 
approach 0 as n becomes infinite; consequently, ; 

lim log Mt(6) = — 

n-*oo 2 

which implies that 02 

lim Mf(0) = e^ 

n~*co 

The two theorems discussed in the paragraph immediately preceding 
Theorem 2, Chapter 5, can now be applied to give the desired result. 
Since the moment generating function of the variable C= (x — 
approaches the function which is the moment generating function 
of a standard normal variable, these theorems insure that the variable t 
possesses a distribution that is approaching the distribution of a standard 
normal variable. This result may be stated as follows. 

Theorem 3: If x has a distribution with mean and standard devia- 
tion a for which the moment generating function exists^ then the variable 
t ^ (x — fji)VnlG has a distribution that approaches the standard normal 
distribution as n becomes infinite. 

This theorem is known as di central limit theorem. Such theorems 
have been studied a great deal by mathematicians interested ih proba- 
bility. Although the preceding proof required the existence of the imoment 
generating function oix, a proof very similar to the preceding proof can 
be cohstfucted that requires only the existence of the first two moments; 
however it requires a knowledge of complex variables. From a practical 
point of view, this theorem is exceedingly important because jt . permits 
the use of normal curve methods on problems related to means of the type 
illustrated in the preceding section even when the basic variable :.^ has a 
distribution that differs considerably from normality. Of course the more 
the distribution differs froth normality, the larger w must become to 



Fi^. 3. Distribution of from a rectangular distribution. 



146 


rNTRODUCTION TO MATHEMATICAL STATISTICS 


^ + 3 :^ 1 = 
M-3s| 




I -1 -I i 1-- i 1 I t 1 ^ I I ^ \ I L 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

Fig. 4. Control chart for the mean, 

guarantee approximate normality for x. Sampling experiments have 
shown that for n> 50 the form of f(x) has little influence on the form of 
f{x) for ordinary types of f(x). Figure 3 shows the empirical distribution 
of ^ for 100 samples of size 10 each from the rectangular distribution given 
/(^) = 0 ^ 0 ? < 1, together with the fitted normal curve. The 

convergence toward normality appears to be quite rapid here. 


6.6.1 Applications 

The control-chart technique introduced in 5.3.4. 5 was designed to check 
on successive sample proportions to determine whether they behaved 
like random samples from a binomial population. A similar chart may 
be constructed for sample means. Because of Theorem 3, it is not essential 
that the basic variable be exactly normally distributed for such charts ; 
consequently, they are of wide applicability. Such a chart is shown in 
Fig. 4. It will be observed that the process appears to be under control. 
It should be noted that the control band is a three-standard deviation 
band about the mean, and therefore for a normal variable the probability 
should be only .003 that a point will fall outside this band. Since many 
industrial variables are not normally distributed, and since the sample 
means used in control charts are often based on only four or five measure- 
ments, one could hardly expect the probability of .003 to be very realistic. 
Three standard deviation control limits are chosen because industrial 
experience has found them to be especially useful rather than because 
they correspond to a desirable probability. 

6.7 Distribution of the Difference of Two Means 

A frequently occurring problem in science is that of determining 
whether real differences exist between two sets of similar data. One 
method of treating the problem is to test whether the means of the popula- 
tions from which the data were obtained are essentially equal. 



ELEMENTAKY SAMEEING THEORY FOK " 147 


Let X and y be the sample means of two sets of data based on random 
samples of size and n^, respectively. Since the samples are random, 
X and y will be independently distributed. If a; and ?/ are norrpally dis- 
tributed, or if and are sufficiently large to justify the practical impli- 
cations of Theorem 3, x and p will be normally distributed ot at least 
approximately so. It is assumed therefore that x and y are normally 
distributed. 

Now consider the moment generating function of the variahje^ — y. 
If Theorem 1, property (22) of Chapter 5, formula (16), and Theorem 2 
are applied in succession, it will be found that 


M,_/0) == M,(e)M_,(6) 

« Ar,(e)M,(-0) 


.-Ve+(^+^) 


£! 

2 


Since this function is the moment generating function of a normal variable, 
this result proves the following theorem. 

Theorem 4: // x and y are normally and independently di^ 
then X — y is normally distributed with mean == and standard 

deviation + Oy^jny, 


6.7.1 Applications 


Consider the following problem. A potential buyer of li^t bulbs 
bought 50 bulbs of each of two brands. On testing these bulbs, he found 
that brand A had a mean life of 1282 hours with a standard deviation of 


80 hours, whereas brand B had a mean life of 1208 hours with a standard 
deviation of 94 hours. Can the buyer be quite certain that the two brands 
do differ in quality? To answer this question, it will suffice to test the 


hypothesis 





against the alternative 




Since x and y are based on samples of 50, it is safe to assume that x and 
y are normally distributed. The samples are obviously independent; 
hence Theorem 4 may be applied to yield the conclusion that x — y is 
normally distributed with 

^ 0 and o^^y = 




148 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Since and are unknown, it is necessary to estimate them by means of 
their sample values. Such approximations introduce an error, but for 
samples as large as 50 this error is not serious. It can be shown that the 
error in very likely does not exceed 10 per cent here. With these 
approximations 

= 0 and 17.5 

r y ^ ^ V 50 50 

Hence 

t = ^ ~ ^ l^^-y ^ 74 _ ^ 23 

17.5 

Because of the choice of Hi, the critical region for this test is chosen to 
consist of the two equal tails of the distribution of ^ This choice 
for Hi is made because there is no external reason for believing that one 
brand should be better than the other. If, as usual, a critical region of 
size .05 is selected, then a value of |/| >2 suffices to reject Hq, The value 
t = 4.23 is certainly significant, and therefore Hq is rejected. It seems 
quite certain that the two brands dijBfer in quality as far as mean burning 
time is concerned and that brand A is to be preferred. 

After a test has indicated a significant difference, it is usually of interest 
to determine how large a difference in the population means may be 
reasonably assumed to exist. This problem is considered in Chapter 11. 
If only a point estimate of — fiy is desired, it suffices to choose x — y = 
74 as the estimate. This estimate is easily shown to be the maximum 
likelihood estimate of — /^y under the assumption that x and y are 
independently normally distributed. 


6.8 Distribution of the Difference of Two Proportions 

If two sets of data drawn from two binomial populations are to be 
compared, it is necessary to work with the proportion of successes rather 
than with the number of successes, unless the number of trials in each set 
is the same. For example, 40 heads in 100 tosses of a coin would not be 
compared with 30 heads in 50 tosses unless they were both placed on a 
percentage basis. Now, from (34), Chapter 5, it follows that the propor- 
tion of successes p' == xjn may be assumed to be normally distributed 
with mean p and standard deviation Vpqjn, provided that n is large. 

Let Pi and p^ be two independent sample proportions based on % 
and trials, respectively, from two binomial populations with proba- 
bilities /?-, and P 2 , respectively, and assume that Wj and Wg are large enough 



ELEMENTARY SAMPLING THEORY jjqr oiijg I49 

to treat p^ and as normal variables. Then, if one proceells with 
Pi — p4, as was done in 6.7 for x — y, it follows that 


e • e 


2^2 




Since this is the moment generating function of a normarvafiabte, these 
results yield the following theorem. 

Theorem 5: When the number of trials n^ and n^ are sufficiently large, 
the difference of the sample proportions pf — p 2 wilt be approximctiely nor- 
mally distributed with = p^ — p^ and (y^^ _^^ = V Piqijn^ + 

Just as for the simple binomial distribution, the hormal approximation 
will usually be satisfactory in applications if each exceeds 5 when 
pi ^ i and n^qi exceeds 5 when i! 


6,8J Applications 


As a first illustration, consider the following problem. A railroad 
company installed two sets of 50 red oak ties each. The two sets were 
treated with creosote by two different processes. After a number of 
years of service, it was found that 22 ties of the first set and 18; ties of 
the second set were still in good condition. Is one justified in claiming 
that there is no real difference between the preserving properties of the 
two processes ? To answer this question, let p^ and p^ denote the respective 
probabilities that a railroad tie treated by the corresponding process 
will be in good condition after this number of years of service. Then set 
up the hypothesis 

Ho\Pi =/?2 

against the alternative 




If the common value of pi and p^ under is denoted by p, then by 
Theorem 5 it follows that 




— 0 and 




_ Ih + PR 

~ N 50 50 5 


The value of p is unknown, and so its value must be estimated froni sample 
values. Since the hypothesis treats the two samples as though they 



150 


INTRODUCTION TO MATHEMATICAL STATISTICS 



were drawn from populations with the same p, the samples may be com- 
bined into one sample of 100 for which there were 40 successes. Hence a 
good estimate of p here is ,40. With this estimate, 

The situation is described geometrically in Fig. 5. 

Since p^ — p^ = .44 — .36 = .08 lies well within a two-standard 
deviation interval of the mean, the hypothesis Hq will be accepted. The 
fact that the value of p must be estimated from the sample values and 
that p^ — p^ is only approximately normally distributed makes this 
test somewhat inaccurate. Both samples are large enough in this illustra- 
tion, however, to insure a fairly reliable test. 

As a second illustration, consider a problem that arises frequently in 
the construction of tests. A civil-service examination is given to a group 
of 200 candidates. On the basis of their total scores, the 200 candidates 
are divided into two groups, the upper 30 per cent and the remaining 70 
per cent. Consider the first question on this examination. In the first 
group, 40 had the correct answer; in the second group, 80 had the correct 
answer. On the basis of these results, can one conclude that the first 
question is no good at discriminating ability of the type being examined 
here? To solve this problem, set up the hypothesis 

ffo-Pl=P2 

where p^ and p^ denote the respective probabilities of an individual from 
each of the two groups getting the correct answer on the first question. 
The natural alternative hypothesis here is 

HtPT,>Pi 

because the better candidates would be expected to do at least as well as 
the weaker candidates on all questions. As before, it follows that 



ELEMEMTARY SAMPUNG THEORY FO^L ONE TAK^ 151 

where p is the common value of = p^ under Hq. To estimate /», com- 
bine the two groups to give 120 successes in 200 trials, or an estijnate of 
.60. Using this estimate for p, 

Now, pi' — Pa' == 40/60 — 80/140 = .10; therefore, 

t^Pi - Pa ^ 1.32 

Since t < 1.64, the hypothesis Hq will be accepted. This implies that the 
first question is not satisfactory for distinguishing between the Stronger 
and the weaker candidates and therefore should be deleted from the 
examination. It might happen, however, that quite a few of the questions 
will fail to show discriminating ability as judged by individual sigmficahce 
tests such as this but when taken together will show such ability. Hence, 
from a practical point of view, one dd^s not always rqect a question 
merely because it does not reject the hypothesis Hq, ;! 


6.9 Chi-square Distribution 

The techniques that have been developed in this chapter enable one to 
solve certain problems relating to radial distances. As an example, sup- 
pose a marksman is shooting at a circular target and suppose it imay be 
assumed that the horizontal and vertical cdmponents of his errors are 
independently normally distributed with a common variance. If i and y 
denote those errors, as shown in Fig. 6, then the radial error is given by 
r = The frequency function of r can be obtained by the 

methods of this chapter. 



Fig. 6. A radial error problem. 



152 


INTRODUCTION TO MATHEMATICAL STATISTICS 

A similar problem in three dimensions would involve the sum of squares 
of three random variables. Now, it is not much more difficult to treat n 
variables than three variables; therefore, in order to have a general result 
that will handle all such problems, the general situation is considered 
here. Furthermore, it will be discovered later that this general result is 
very useful in the development of certain branches of statistical theory. 

It is a simple matter, by means of (42), Chapter 5, to find the distribu- 
tion of the square root of a variable if one knows the distribution of the 
variable itself; therefore it will suffice to find the distribution of the 
variable 

w = 2 * / 

i = l 

where the variables ’ ’ ’ » constitute a random sample from a 

normal population with mean 0 and standard deviation 1. From the 
formula of Theorem 1 in section 6.4, together with properties of random 
samples, it follows that 


(18) 

= Mxe) 

Now a; is a standard norma! variable ; therefore 


1 p 
a/ 27r J —00 




2 


dx 


e - 


dx 


Let y = xV 1—26; then this integral reduces to 

M,.{d) = (1 - 20)"^ ^ dy 

= (1 - 26) 2 

From this result and (18), it therefore follows that 

(19) MJiB) = (1 - 26)““" 

All that remains to be done is to find a frequency function correspond- 
ing to this moment generating function and then apply the uniqueness 
argument concerning frequency functions and moment generating func- 
tions. In the method employed here the answer is written down and its 



ELBMENTARY SAMPOHG THBORY FOR OiSTE VARtAfiLl T 153 

correctness is verified. That is, a particular frequency function is written 
down, and it is then shown that a variable having this frequency i\inction 
possesses the moment generating function given by (19). The frequency 
function that corresponds to (19) defines what is known as the chi-square 
distribution. A special case of this distribution was obtained in (43), 
Chapter 5. Although it is a cumbersome variable to use, the ba|ic vari- 
able here is usuffly denoted by the Greek letter In terms of this nota- 
tion, the chi-square distribution is defined as follows : 1; 


-_i -.r 
/ 2^2 2 

(20) Chi-Square Distribution: ^ ^ 

*(i) 

The symbol r(a;) denotes the gamma or factorial function of x, which 
has the property that r(a:-|-l) = ;rr(x). Because of this propierty, a 
table of values of the gamma function for 1 ^ 'x < 2 will’ suffice to evaluate 
the function for other positive values of x, lix is an integer, no tables 
are needed because r(l) = 1 and then T(x q- i) = x! The necessary 
tables for x not an integer can be found in any handbook of mathematical 
tables. In particular, they show that T(^) = Vtt, 

The parameter v is called the number of degrees of freedond of the 
distribution. This name is given to it because it is equal to the number of 
independent variables occurring in w. If v is set equal to 1, it will be 
observed that (20) reduces to the function obtained in (43), Chapter 5. 
For the more general problem being considered here, the value of v is 
n. A graph of (20) for several values of v is given in Fig. 7. 



Fig. 7. Distribution of for various degrees of freedom. 


154 


INTRODUCTION TO MATHEMATICAL STATISTICS 


For convenience in finding the moment generating function of a 
variable, let the variable be denoted by s. Then the generating function 
of will be given by 


1 r 



V 


■1 

e 


^ dz 




1 ( 1 - 20 ) 5-1 


dz 


Let y == z{l — 26)12; then dz = 2dyl{l — 26) and 



But, as shown in any standard set of mathematical tables, this integral is 
the integral that defines F ; hence the moment generating function 
of for V degrees of freedom is given by 


V 


( 21 ) 


= (1 - 20 ) 2 


A comparison of this result with formula (19) shows that w has the 
moment generating function of a variable with n degrees of freedom. 
Since a distribution is uniquely determined by its moment generating 
function, the preceding derivation proves the following theorem. 

Theorem 6 : If x is normally distributed with zero mean and unit variance, 
the sum of the squares of n random sample values of x has a distribution 
with n degrees of freedom. 


As an illustration of the use of this distribution in radial error problems, 
consider the problem of calculating the probability that an antiaircraft 
shell designed to burst at a specified point in space will have a radial error 
of at most 100 feet if it is assumed that the x, y, and 2 ^ errors of a co- 
ordinate system with the origin at the specified point are independently 



ELEMENTARY SAMPLING THEORY FOR ONE VARIABLE ■ 155 

normally distributed with a common standard deviation of 50 fe;et. For 
convenience of notation, let = xjSO, = yj 50, and x^ = zfS^, Then 
x^, x^, x^ v/i\l be independent, normally distributed variables with zero 
means and unit variances; consequently, by Theorem 6, x^^ + x^ + x^^ 
will possess a distribution with three degrees of freedom. Using (20), 
one can therefore TO 

+ 1/2 + 22 < 100} = P {*2 + yi + 22 1002} 

— P { V + V + *3* < 4} 



r -2 

Jo 


w 

^ dw 


Although such tables are not presented in this book, there are stjitistical 
tables that enable one to evaluate integrals of this type. Reference to 
the proper tables will show that the preceding answer reduces to P = .72. 

The distribution was introduced here in connection with' radial 
distance problems; however, it is one of the most important theoretical 
distributions in statistics and appears repeatedly in later chapters in 
connection with various types of problems. 

REFERENCES . ! ' • ^ : 

' '■ ■ ■ ■ • ' ’ ■ ii 

An interesting discussion of random sampling may be found in M. G. Kendall, The 
Advanced Theory of Statistics, VoL 1, Griffin and Co. 

A proof of Theorem 3 that requires sortie additional mathematical background may be 
found in E. Parzeh, ModernTrobcMldy Theory and Its Applications, John Wiley and Sons. 

An elernentary discussion of the control chart technique for the mean is available in 
L. H. C. Tippett, Technological Applications of Statistics, John Wiley and Son^. 

Tables for evaluating the integral in connection with the distribution are available in 
Fisher and Yates, Statistical Tables, Oliver and Boyd. ' 


EXERCISES j; 

1. Suggest how to sample randomly from {a) students at a university, {b) 
households in a city, (c) the adult public, (d) a carload of wheat. j; 

2. Explain which features of random sampling are satisfied and which features 
are not satisfied if you wish to estimate the distribution of students’ graSe-point 
averages and do so by taking a sample of 100 students from the registration files 
by consulting a table of random numbers corresponding to the enrollment but 
always ignoring any grade-point average less than .8. Assume that the' student 
enrollment is very large. 

3. Show that E{x — c)^ is a minimum when c 



156 


INTRODUCTION TO MATHEMATICAL STATISTICS 


4. Give an example of two random variables for which E[xy] ^ E[x^ E[y\, 

5. Given an example of 2 dependent random variables for which the variance 
of their sum is (a) larger than the sum of their variances (b) smaller than the sum 
of their variances. 

6. Given/(a;, y) == a: >0,y >0, (a) calculate the value of E[z\ where 

z = X y,{b) calculate the value of E[z\ (c) find MJfi), 

7. By using moment generating function methods, show that the sum of 2 
independent binomial variables with the same parameter p is also a binomial 
variable. How could you argue this result directly ? 

8. By using moment generating function methods and the result in problem 
27, Chapter 5, show that the sum of 2 independent Poisson variables with means 

and ^2 is also a Poisson variable with mean + //g- 

9. Using the methods of problem 8, explain why the difference of 2 independent 
Poisson variables will not be a Poisson variable. 

10. Let X have the distribution /(^c) == pcf^ x = 0, 1, 2, • •(a)CalculateEtiJ?] 

by using the formula 1/(1 — q) = \ + q + • • • and its derivative. 

{h) Calculate the variance of x by first calculating E[x{x — 1)] by means of 
similar techniques. 

11. Cards numbered 1 through n are shuffled and laid out in a line. Let 
Xj^ = 1 if the number on the ^th card is smaller than the number on the {k + l)th 

n-l 

card and let otherwise. Let « = 2 which means that z represents the 

total number of increases in the sequence. Calculate the mean and variance of z. 

12. If a; is normally distributed with y — 20 and c; = 5, calculate the proba- 
bility that (a) X > 2\, {b) X > 2\, X based on a random sample of size 25. 

13. Past experience indicates that wire rods purchased from a company have 
a mean breaking strength of 400 pounds and a standard deviation of 1 5 pounds. 
(a) If 16 rods are selected, between what 2 values could you reasonably expect 
their mean to be ? (b) How many rods would you select so that you would be 
certain with a probability of .95 that your resulting mean would not be in error 
by more than 2 pounds ? 

14. If you wish to estimate the mean of a normal population whose variance 
is 10, how large a sample should you take so that the probability is .80 that your 
estimate will not be in error by more than .4 unit? 

15. A research worker wishes to estimate the mean of a population by using a 
sample large enough that the probability will be .95 that the sample mean will 
not differ from the true mean by more than 25 per cent of the standard deviation. 
How large a sample should be taken ? 

16. The following data represent the initial velocities in meters per second of 
projectiles fired from the same gun. (a) Determine the accuracy of the sample 
mean x as an estimate of the true mean velocity. (Jb) Calculate the approximate 
probability, using the value of s for the data in place of a, that a sample mean x 
based on a sample of this si^e will deviate more than \ unit from the true mean, 
(c) Why are the methods used in (a) and {b) not very satisfactory here? 

455 454 450 453 452 451 450 454 

4S1 451 457 454 450 454 454 



ELEMENTARY SAMPLING THEORY FOR ONE VARIABLE ;; 1 57 

17. Suppose ope mixes the ingredients for concrete to attain a mean freaking 
test of 2000 pounds. If the mean drops below 1800 pounds, the composition will 
be changed. How many tests will need to be made in order that a =:i;.05 and 
P = .10 in testing the hypothesis HQ\fjL — 2000 against =1800 if 'p = 200 
and one assumes normality ? 

18. Find an expression for the power function of the test when testing = 
10 against > 10 for a normal variable with unit variance. Use ttie right 
tail of the x distribution as critical region with n = 25 and a = . 10. 

19. Have each member of the class perform the fbllovvihg experiment 1;6 times. 

Select 10 one-digit random numbers from the table of random numbers in the 
back of the book. Calculate the mean for each set of 10 Bring these 10 experi- 
mental means to class, where the total set of such means may be classified, the 
histogram drawn, and the mean and standard deviation computed. These 
results should then be compared with those expected under Theorem 3. The 
population here has yM “ 4.5 and O’ = 2.87. ' 

20. The same test was given to 2 classes. The first class of 20 students averaged 
123 points with a standard deviation of 32 points ; the second class of 30 averaged 
130 points with a standard deviation of 24 points. Is it safe to conclude that the 
second class is superior ? 

21. Two sets of 100 students each were taught to read by 2 different riiethods. 

After instruction was oyer, a reading test gave the following results : = 73.4, 

y = 70.3, = 8, Sy =10. (a) Test the hypothesis that/^a! = {b) Determine 

the accuracy of x ^ ^ as an estimate of (c) Determine how jfarge an 

equal-size sample from each group should have been used if it were desired to 
estimate to within 1 unit with a probability of .95. 

22. Suppose that you wish to test whether there is a tendency for an iridividuars 
right foot to be longer than his left foot, (a) Explain why it would be incorrect 
to take a random sample of, say, 100 individuals and apply the usual technique 
for testing = 7 i^, where x and y correspond to the right and left foot, 
respectively, {b) Explain how you could sample differently or handle the data 
differently to overcome the difficulty here. 

23. Suppose that x and y are the means of 2 samples of size n each from a 
normal population with variance cr^. Determine « so that the probability vvill be 
about .99 that the 2 sample means will differ by less than a, 

24. Two different samplers X and Y were sent into the same forest tp select 

trees at random. Each sampler took a sample of 100 trees and measured their 
diameters with the following results: x = 19.2, ^ = 20,3, = 3.2, = 2.6. 

(u) Does the smaller standard deviation for 7 imply that he is a more liccurate 
sampler than XI ib) What conclusions can be drawn concerning the accuracy 
of JY and 7? (c) If you knew that the true mean was 19,7, could you djraw any 
further conclusions? 

25. Suppose = Jand<Ta, = = 1 for 2 independent normal variables. 

How large an equal-size sample from each population shbuld be taken so that 
the probability of rejecting the false hypothesis ETo-./^a, = will be .90 if the 
critical region is two-sided and a = .05, " 

26. In a large-scale experiment 2000 children were split into 2 groups of 1000 
each. One group received a serum for the prevention of a disease; the other 



158 INTRODUCTION TO MATHEMATICAL STATISTICS 

group received no serum. The number of children in each group who contracted 
the disease was 30 and 50, respectively. Treating these 2 numbers as sample 
values of 2 Poisson variables which may be considered as approximately normally 
distributed, test the hypothesis that 

27. In a poll taken among college students, 46 of 200 fraternity men favored a 
certain proposition, whereas 51 of 300 nonfraternity men favored it. Is there a 
real difference of opinion on this proposition ? 

28. A manufacturer of housedresses sent out advertising by mail. He sent 
samples of material to each of 2 groups of 1000 women, but for 1 group he used 
a white envelope and for the other group he used a blue envelope. He received 
orders from 10 and 13 per cent, respectively. Is it quite certain that the blue 
envelope will help sales ? 

29. A civil service examination was given to 200 people. On the basis of their 
total scores, they were divided into the upper 30 per cent, the middle 40 per cent, 
and the lower 30 per cent. On a certain question, 39 of the upper group and 29 
of the lower group answered correctly. Is this question likely to be useful for 
discriminating the ability of the type being tested ? 

30. If the percentage of defective parts turned out by the same machine on 2 
consecutive days is 6 and 8 per cent and 500 parts were turned out on each of 
those days, would the inspector be justified in claiming that the quality had 
slipped ? 

31. A test of 100 youths and 200 adults showed that 50 of the youths and 60 
of the adults were poor drivers. Use these data to test the claim that the youth 
percentage of poor drivers is larger than the adult percentage by 10 percentage 
points, against the possibility of a still larger difference. 

32. Two players each play a game of chance 100 times. If 1 dollar is paid 
for every win and the probability of winning at a single trial is j, what is the 
approximate probability that the first player will finish with at least 5 dollars 
more than the second player ? 

33. (a) Construct a control chart for x for the following data on the blowing 
time of fuses, samples of 5 being taken every hour. Each set of 5 has been arranged 
in order of magnitude. Estimate by first estimating a by means of s calculated 
for all 60 values, (b) Comment on whether production seems to be under control, 
assuming that these are the first data collected. 


42 

42 

19 

36 

42 

51 

60 

18 

15 

69 

64 

61 

65 

45 

24 

54 

51 

74 

60 

20 

30 

109 

91 

78 

75 

68 

80 

69 

57 

75 

72 

27 

39 

113 

93 

94 

78 

72 

81 

77 

59 

78 

95 

42 

62 

118 

109 

109 

87 

90 

81 

84 

78 

132 

138 

60 

84 

153 

112 

136 


34. Using moment generating function methods and the results of problem 
6 (c) and of problem 37 (c), Chapter 5, find the frequency function of z = x y 
given in problem 6. 

35. Prove that if x and y are independent variables having the same rectangular 
distribution with range 1 and mean i then x y will not have a rectangular 
distribution. 



ELEMENTARY SAMPLING THEORY FOR ONE VARIABLE 1^9 

36. Calculate < 1} and P{xj^ 4- < 2} for a random sample of 

size 2 from the frequency function (a) f(x) =|, —1 ^ x ^ I, and {b) f\x) = 

37. Find the frequency function of nx for a random sample of size n from the 
population with frequency function /(a;) = x > 0, 

38. Prove that a liiiear cpmhmatipn of independent normal variables ;is also a 
normal variable. 

39. Use the moment generating function of a variable with r degrees of 

freedom to find its mean and variance. ^ 

40. Use expected value operator methods on w = 2 ^he ntean and 

1 

variance of a variable with « degrees of freedom. 

41 . If x^, iijg, • * * , x^ is a random sample from the distribution with frequency 
function f(x), calculate P{x^ < r, • ■ • , x.^^ < /} and use it to find the frequency 
function of the variable 2 == max {x^, * • • , x j. Express the result in terms of 
f(x) and its distribution function F(.t). 

42. Use the result in problem 41 to find the distribution of the 

piece of electronic equipment that has « vital parts with lifetimes x^- '- • ,x^ 
which are independently and identically distributed with frequency function 
f(x) = X ^ 0, if it is assumed that one functipning vital part is sufficient 

to operate the equipment. 








CHAPTER 7 


Correlation and Regression 


The statistical methods presented thus far have been largely concerned 
with a single variable x and its frequency function. Many of the problems 
in statistical work, however, involve several variables. This chapter is 
devoted to explaining some of the simpler methods for dealing with data 
associated with two or more variables. The emphasis is on two variables. 
Chapter 8 is concerned with the construction of models and other theo- 
retical aspects of the problems brought up in this chapter. 

In some problems the several variables are studied simultaneously to 
see how they are interrelated; in others there is one particular variable 
of interest and the remaining variables are studied for their possible aid 
in throwing light on this particular variable. These two classes of prob- 
lems are usually associated with the names of correlation and regression 
respectively. They are considered in that order. 


7.1 Linear Correlation 

A simple correlation problem arises when an individual asks himself 
whether there is any relationship between a pair of variables that interests 
him. For example, is there any relationship between smoking and heart 
ailments, between music appreciation and scientific aptitude, between 
radio reception and sunspot activity, between beauty and brains ? 

Consider two random variables x and y and the problem of determining 
the extent to which they are related. The investigation of the relationship 
between two such variables, based on a set of n pairs of measurements 
(^i> * * * j yr)^ usually begins with an attempt to discover 

the approximate form of the relationship by graphing the data as n 
points in the x,y plane. Such a graph is called a scatter diagram. By 
means of it one can quickly discern whether there is any pronounced 
relationship and, if so, whether the relationship may be treated as approxi- 
mately linear. 


160 



CORRELATION AND REGRESSION 


161 


As an illustration, consider the data of Table 1 consisting of the scores 
of 30 students on a language test x and a science test y. The maximum 
possible score on each of these tests was 50 points. The choice pf which 

Table 1 


X 

1 y 

X 

1 y 

^ 1 

1 y 

34 

37 

28 

30 

39 

36 

37 

37 

30 

34 

33 

29 

36 

34 

32 

30 

30 

29 

32 

34 

41 

37 

33 

40 

32 

33 

38 

40 

43 

42 

36 

40 

36 

42 

31 

29 

35 

39 

37 

40 

38 

40 

34 

37 

33 

36 

34 

31 

29 

36 

32 

31 

36 

38 

35 

35 

33 

31 

34 

32 


variable to call x and which to call y is arbitrary here. The scatter cjiagram 
for these data is shown in Fig. 1. ; 

An inspection of this scatter diagram shows that there is a tehdency 
for small values of cc to be associated with small values of and fpr large^ 
values of x to be associated with large values of y, Furthermpre, the 
general trend of the scatter is that of a straight line. For variables such 
as these, it would be desirable to be able to measure in some sense the 


40 


35 


30 


30 35 40 45 

Fig. 1. Scatter diagram for language and science scores. 




162 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 2. Scatter diagram for standardized scores. 


degree to which the variables are linearly related. For the purpose of 
devising such a measure, consider what properties would be desirable. 

A measure of relationship should certainly be independent of the choice 
of origin for the variables. The fact that the scatter diagram of Fig. 1 
was plotted with the axes conveniently chosen to pass through the point 
(25, 25) implies that the relationship was admitted to be independent of 
the choice of origin. This property can be realized by using the variables 

and yi in the forms — x and y^ — yin the construction of the desired 
measure. 

A measure of relationship should also be independent of the scale of 
measurement used for a; and y. Thus, if the x and y scores of Table 1 were 
doubled in order to make them appear like conventional test scores, the 
relationship between the variables should be unaffected thereby. This 
property can be realized by dividing x and y by quantities which possess 
the same units as x and y. For reasons that will be appreciated presently, 
the quantities that are chosen here are and Sy. Both properties will 
therefore be realized if the measure of relationship is constructed by using 
the variables x^ and y^ in the forms — x)ls^ and — y)lsy- 

This merely means that the x^ and y^ should be measured in sample 
standard units. 

The scatter diagram of the points (m^-, for the data of Table 1 is 
shown in Fig. 2. It will be observed that most of the points are located 
in the first and third quadrants and that the points in those quadrants 



CORRELATION AND REGRESSION I" 163 

tend to have larger coordinates, in magnitude, than those in the' second 

and fourth quadrants. A simple measure of this property of the' scatter 

% 

is the sum 2 The terms of this sum that are contributed by points 

in the first and third quadrants will be positive, whereas those corre- 
sponding to points in the second and fourth quadrants will be negative, 
A large positive value of this sum would therefore seem to indicate a 
strong linear trend in the scatter diagram. This is not strictly true, however, 
for if the number of points were doubled without changing the nature of 
the scatter, the value of this sum would be approximately doubled. It is 
therefore necessary to divide this sum by w, the number of pointsj before 

using it as a measure of relationship. The resulting sum, 2 is the 

1=1 ' i’ 

desired measure of relationship. It is called the correlation cckfficient 
and is denoted by the letter r\ hence, in terms of the original nieasure- 
ments, r is defined by the following formula. 

n i 

2 (*i - - F) “ 

(1) Correlation Coefficient : r = — — i 

ns^Sy 

Calculations wijl show that r = .66 for the data of Table 1. Ijti order 
to interpret this result and discover what values of r are likely to be ob- 
tained for various degrees and types of relationship between a?; and y, 
consider the scatter diagrams of Fig. 3. The first four diagrams corre- 
spond to increasing degrees, or strength, of linear relationship. If these 
diagrams were rotated about the y axis through 180° so that the scatters 
appeared in the second quadrant, the scatters would have downward 
trends rather than upward trends and the corresponding values of r 
would be the negatives of the listed values. Thus the magnitulde of r 
determines the strength of the relationship, whereas the sign of r tdls one 
whether y tends to increase, or decrease, with x. The fifth diagram illus- 
trates a scatter in which x and y are closely related but in which the 
relationship is not linear. This illustration points out that r is a useful 
measure of the strength of the relationship between two yariabfes only 
when the Yatiables are linearly related. 

The diagrams of Fig, 3, together with the associated values of r, make 
plausible two properties of r, namely, that the value of r must satisfy 
the inequality — 1 ^ r < 1 and that the value of r will be equal tq ±1 if; 
and only if, the points of the scatter lie on a straight line. A denjonstra- 
tion of these properties is sqmewhat lengthy and therefore wilf not be 
undertaken here; however, it is available in the appendix, , ; 



164 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 3. Scatter diagrams and their associated values of r. 


7,1.1 Interpretation of r 

The interpretation of a correlation coefficient as a measure of the 
strength of the linear relationship between two variables is a purely 
mathematical interpretation and is completely devoid of any cause or 
effect implications. The fact that two variables tend to increase or de- 
crease together does not imply that one has any direct or indirect effect 
on the other. Both may be influenced by other variables in a manner that 
will give rise to a strong mathematical relationship. For example, over a 
period of years the correlation coefficient between teachers’ salaries and 
the consumption of liquor turned out to be .98, During this period of 



CORRELATION AND REGRB^ICW ' 165 

time there was a steady rise in wages and salaries of all types and a general 
upward trend of good times. Under such conditions, teachers’ Salaries 
would also increase. Moreover, the general upward trend in wa|es and 
buying power would be reflected in increased purchases of liquon Thus 
this high correlation merely reflects the common effect of the ilpward 
trend on the two variables. Correlation coefficients must be handled 
with care if they are to give sensible information concerning relationships 
between pairs of variables. Success with correlation coefficients requires 
familiarity with the field of application as well as with their mathejmatical 
properties. 


7*1.2 Calculation of i* 


The formula in (1) that defines r is not always convenient for com- 
putational purposes. A better form is obtained by multiplying out 
factors and inserting values for and 5:^: 


! ... 

- nf] 

_ ___ 

■ . ii 

This last form requires the sums of x, y, 2 inA xy, all of which are 

readily calculated with modern electric calculators. 

If the data are so numerous that the preceding computations: would 
become unduly lengthy even with a calculating machine, then it may be 
worthwhile to classify the data with respect to both variables, just as 
was done for one variable in Chapter 4 in calculating x. When the data 
have been so classified, the short method of computation used for finding 
means may be employed to advantage in computing ?. Let 

Xi ^ + Xq 

and .. .. . . .. |.i . 


yi = w + ^0 

where and Cy are class intervals and u and are new integral variables. 
Then, because of the property of being independent of the choice of 
origin and choice of scale for and «/, the .value of r calculated for the 
integral variables u and v will be the same as for a; and 2/. This fact may 
be verified directly by substituting these changes of variables in (1) and 
simplifying. 



166 


INTRODUCTION TO MATHEMATICAL STATISTICS 


7.1.3 Reliability of r 

In any given problem involving linear correlation, the value of r may 
be thought of as the first sample value of a sequence of sample values 
‘ ■ that would be obtained if repeated sets of similar data were 
obtained. Such sets of data are thought of as having been obtained 
from drawing random samples of size n from some population. For 
example, the data of Table 1 are assumed to have been obtained from 
drawing a random sample of size 30 from a population of students. 

The population being sampled can be described with respect to the two 
variables x and y by means of the frequency function f{x, y) of those 
variables. Now, suppose that the function f (x, y) contains a parameter 
p whose value serves to measure the extent to which x and y are linearly 
related in a probability sense. Then r may be used to estimate the value 
of p, just as a sample mean x is used to estimate the population mean 
The parameter p would, of course, be called the theoretical, or population 
correlation coefficient. 

Frequency functions of two correlated variables will be studied in 
Chapter 8. In particular, a frequency function, called the normal fre- 
quency function, which contains a parameter p of the type described in 
the preceding paragraph, is introduced. It is shown in the next chapter 
that r is the maximum likelihood estimator of p, provided that x and y 
possess a normal frequency function. This demonstration constitutes 
the justification for choosing r as given by (1) as a desirable measure of 
linear correlation. 

If it is assumed that x and y possess a normal frequency function, then 
it is theoretically possible to derive the frequency function of the random 
variable r, just as it is possible to derive the frequency function of x from 
that of X, Both the form and the derivation of this frequency function 
are too complicated to be considered here. It turns out that the frequency 
function of r depends only on the parameters p and «, where n is the 
number of points in the scatter diagram. Graphs of the frequency func- 
tion of r for p = 0 and for p = .8 when « = 9 are shown in Fig. 4. 

It is clear from Fig. 4 that the distribution of r is decidedly non-normal 
for large values of p ; consequently it will not suffice to obtain the standard 
deviation of r and use it to determine the accuracy of r as an estimate of 
p. Fortunately, there exists a simple change of variable which transforms 
the complicated distribution of r into an approximately normal distribu- 
tion. The resulting normal distribution m^y then be used to determine 
the accuracy of r as an estimate of p in the same way that the normal 
distribution of x was used to determine the accuracy of x as an estimate 



CORRELATION AND REGRIKSION 167 

of /t. This change of variable is from r to z, where 

(3) a = 41 og,f±^ 

1 — r 

It can be shown that when the preceding assumptions are satisfied, will 
be approximately normally distributed with mean | 

^ , 1 + p 

i log , ^ 

1 — p 

and standard deviation 



As an illustration, consider the following problem. Is a correlation 
of r ==s .20 between the face index and the cephalic index of 50 niembers 
of a certain race significant? Set up the hypothesis that p == 0^* Then 
the variable z will be approximately normally distributed with 7 ^^ = 
and == l/V 47 = .15. If a significance level of .05 is taken and if the 
two tails of this normal distribution are used as a critical region, a sample 
value of r will be significant if if has a valtid of z such that NI > .30. 
Here, 

. = llog,l|-.20 ‘ 

Since this value does not exceed the critical value, the value of r == .20 is 
not significant. A value as large as this would be obtained fairly often in 
random samples from a population in which the two variables We un- 
correlated. 

As a second illustration, consider the problem of determining an 
interval of values within which r could reasonably be expected'to fall 




168 


INTRODUCTION TO MATHEMATICAL STATISTICS 


if p = .8 and if r is based on a sample of size 28. Let reasonably be under- 
stood to mean with a probability of .95. The construction of such an 
interval can be accomplished by first constructing such an interval for 
z and then transforming it into an interval for r. The simplest interval* 
for 2 that possesses the desired property is the interval with end points 
^1 = and ^2 = /“« + For p = .8 and n = 28, it follows 

from (3) that these end points are 

2i = 4 log 9 - -|= = .70 


22 = i log 9 + 


2 

V25 


1.50 


If r is expressed as a function of z in relationship (3), it will be found that 
r is the hyperbolic tangent of 2. From tables of this function, or from 
tables of exponentials, it will be found that = .60 and /g ^ Thus 
it can be stated that the probability is approximately .95 that the sample 
correlation coefficient will satisfy the inequality .60 < r < .91 when r is 
based on a sample of 28 and p = .8. 


7.2 Linear Regression 

In the introduction to section 7.1, it was pointed out that correlation 
methods are used when one is interested in studying how two or more 
variables are interrelated. It often happens, however, that one studies 
the relationship between the variables in the hope that any relationship 
he finds can be used to assist in making estimates or predictions of one 
of the variables. Thus, if the two variables for Table 1 had been scores 
representing college aptitude x and college success y rather than the 
variables listed there, the relationship between x and y would have been 
useful for assisting one to predict a student’s college success from a knowl- 
edge of his score on a college-aptitude test. The correlation coefficient 
is not capable of solving such prediction problems; therefore, it is neces- 
sary to introduce what is known as regression methods for handling those 
problems. In this section linear regression methods for two variables 
will be studied, whereas curvilinear regression and regression methods 
for several variables will be introduced in later sections. 

Consider the data of Table 2 on the amount of water applied in inches 
and the yield of alfalfa in tons per acre on an experimental farm. The 
graph of these data is given in Fig. 5. From this graph it appears that x 
and y are approximately linearly related for this range of x values. For 



CORRELATION AND REGRESSION 169 



Fig. 5. Hay yield as a function of amount of irrigation. " 

the purpose of predicting y from cr, it should therefore suffice tp use a 
linear function of a;. Thus the problem of prediction first requires one to 
solve the problem of fitting a straight line to a set of points. 


7.2.1 Least Squares 

The problem of fitting a curve to a set of points in some efficient inanner 
is essentially the problem of estimating the parameters of the curve in an 
efficient manner. Although there are numerous methods for performing 
the estimation of such parameters, the best known method is the follow- 
ing, known as the q/* feoy/ ^wares'. ^ 


Table 2 


Water (x) 

12 

18 

24 

30 36 42 48 

Yield (y) 

5.27 

5.68 

6.25 

7.21 8.02 8.71 8.42 


Since the desired curve is to be used for estimating, or prejBicting, 
purposes, it is reasonable to require that the curve be such that it makes 
the errors of estimation small. By an error of estimation, or preHiction^ 
is meant the difference between an observed value of y and th^ corre- 
sponding fitted curve value of y. If the value of the variable to ifee esti- 
mated is denoted by y and the corresponding curve value by then 
the error of estimation, or prediction, is given by y — y'. Since the errors 
may be positive or negative and might add up to a small value for a poorly 
fitting curve, it will not do to require merely that the sum of the errors 
be as small as possible. This difficulty can be avoided by requir&g that 





170 


INTRODUCTION TO MATHEMATICAL STATISTICS 


the sum of the absolute values of the errors be as small as possible. How- 
ever, sums of absolute values are not convenient to work with mathe- 
matically; consequently the difficulty is avoided by requiring that the 
sum of the squares of the errors be a minimum. The values of the param- 
eters obtained by this minimization determine what is known as the 
best fitting curve in the sense of least squares. 

Consider the application of this principle to the fitting of a straight 
line to a set of n points. It is convenient to write the equation of any 
nonvertical line in the form 

(4) y' ^ a + b{x — x) 

where b is its slope and a is the y intercept on the line x = x. The y inter- 
cept on the y axis is a — bx. It will be seen shortly why it is so convenient 
to express the equation of an arbitrary line in this form rather than in the 
slope-intercept form, y ^ a + bx, of analytical geometry. The problem 
now is to determine the parameters a and b so that the sum of the squares 
of the errors of estimation will be a minimum. If the coordinates of the 

n 

/th point are denoted by {x^, y^, this sum of squares will be 2 iVi ViT* 

i = l 

When y/ is replaced by its value, as given by (4), it becomes clear that 
this sum is a function of a and b only. If this function is denoted by 
G(a, b\ then 

G(a, fc) = 2 [2/i - a - b{Xi - 

If this function is to have a minimum value, it is necessary that its partial 
derivatives vanish there ; hence a and b must satisfy the equations 

= 'lA.y - a - b(x- »)][-!] = 0 
da 

dG 

- a - b(x - a:)][-(a: - *)] = 0 


where the subscripts and range of summation have been omitted for con- 
venience. When the summations are performed term by term and the 
sums that involve y are transposed, these equations assume the form 

an -h b^{x — 22^ 

^ ^ d^{x - x) + b^{x — xf = ^{x — x)y 

Since ^(x — = 0, the solution of these equations is given by 


a = y and 


b = ~ 

y^x — xf 



CORRELATION AND REGRESSIOK !: 171 

These values when inserted in (4) yield the desired least squares line. This 
line is usually called the regression line of y on a;; hence the pt^eceding 
derivation gives the result 

(7) Regression Line: y' — y — b(a; — x), where b = 

A pioneer in the field of applied statistics gave the least squares line this 
name in connection with some studies he was making on estimating the 
extent to which the stature of sons of tall parents reverts or regresses to- 
ward the mean stature of the population. 

For computational purposes, it is convenient to change the form of 
slightly in the following manner^ 

^ '^xy — mg 

■— nx^ 

Table 3 illustrates the computational procedure for the data Table 
2. Here (7) was used to calculate b instead of the suggested computing 

Table 3 


X 

y 

a? — ^ 

(x - x)y 

(x — x)^ 

12 

5.27 

-18 

- 94.86 

324 

18 

5.68 

-12 

- 68.16 

144 

24 

6.25 

-6 

- 37.50 

36 

30 

7.21 

0 

0 

0 

36 

8.02 

6 

48.12 

36 

42 

8.71 

12 

104.52 

144 

48 

8.42 

18 

151.56 

324 

210 

49.56 


103.68 

1008 


formula (8) because x is so simple. As a result of these computations, 
the equation of the regression line was found to be 

y* = .lOa; + 4.0 

The graph of this line is shown in Fig. 5. 

In fitting a straight line to a set of points, as in the preceding illustra- 
tion, it is intuitively assumed that the resulting line is an estimafo of a 
theoretical line of regression. The problem of determining the accuracy 
of the least-squares line as an estimate of the theoretical line of regression 
is considered in Chapter 11. Chapter 8, however, discusses theoretical 
lines of regression as models for empirical lines of regression. 



172 INTRODUCTION TO MATHEMATICAL STATISTICS 

There is an important difference between the scatter diagrams of Fig. 1 
and Fig. 5 that should be noted. In Fig, 1 the points correspond to a 
random sample of 30 students; consequently, both x and y are random 
variables. In Fig. 5, however, the x values were chosen in advance, so 
that y is the only random variable. Since least squares can be applied 
whether the x values were fixed in advance or were obtained from random 
samples, the" fegfession approach to studying the linear relationship 
between two variables is more flexible than the corre^^^^ approach. 
The interpretation of r as a measure of the strength of the linear relation- 
ship between two variables obviously does not apply if the values of x 
are selected as desired because the value of r will usually depend heavily 
on the choice of x values. In addition to being more flexible, regression 
methods also possess the advantage of being the natural methods to use 
in many experimental situations. The experimenter often wishes to change 
X by uniform amounts over the range of interest for that variable rather 
than take a random sample of x values. Thus, if he wanted to study the 
effect of an amino acid on growth, he would increase the amount of amino 
acid by a fixed amount, or factor, each time he ran the experiment. 


7.3 Multiple Linear Regression 

It happens quite often that the method of the preceding section for 
estimating one variable by means of a related variable yields poor results 
not because the relationship is far removed from the linear one assumed 
there but because there is no single variable related closely enough to 
the variable bding estimated to yield good results. However, it may 
happen that there are several variables that, when taken jointly, will serve 
as a satisfactory basis for estimating the desired variable. Since linear 
functions are so simple to work with and experience shows that many sets 
of variables are approximately linearly related, it is reasonable to attempt 
to estimate the desired variable by means of a linear function of the 
remaining variables. For this purpose let 7, Xj, X 2 , • • * , represent 
the available variables and consider the problem of estimating the variable 
Y by means of a linear function of the remaining variables. If the variable 
used to estimate Y is denoted by 7', the linear estimating function may 
be expressed as 

(9) 7' = Cq 4" CiXi -h C 2 X 2 + • * ‘ + 

where the c’s are to be determined by means of available data. 

As in the case of two variables, the unknown coefficients are estimated 
by the method of least squares. This implies that n sets of values of the 



CORRELATION AND REGRESSION ' ^ ' 

k + I variables are available for obtaining the estimates. Geometrically, 
the problem is one of finding the equation of the plane which fits best, in 
the sense of least squares, a set of n points in fc + 1 dimensions.:; 

The problem now is to find the set of c’s in (9) that will minJmi^:^ the 

n 

sum ^(Yi — F/)2. As in the case of two variables, it is more coinvenient 

to work with variables measured from their sample means than with the 
variables themselves ; hence first let 

' „ . ii . 

:= Xj — X^, ^ 7 = 1? 2, * • • , 

If y' is defined by y' = 7' — F, then 

( 10 ) Y^r = y+Y^(^'+Y)^y^y^ 

If now the capital 7’s and 7 in (9) are expressed in terms of the small 
x"s and y^ that equation can be written in the form 

( 11 ) y' = aQ + a^x^ + a2X2+ • - + aj,Xj, : 

where the a’s could be expressed in terms of 7, the c’s, and tHe X’s if 
so desired. However, from (10) it is clear that minimizing ^ ( 7 4 7')^ is 
equivalent to minimizing ^(y — y'Tl consequently one can just as well 
determine the n’s to minimize the latter sum, which because of (11) may 
be written 

(12) G(flo, «i, • * * , %) = 2[«/ - - a^x^ - • * • - 

If this function is to have a minimum value, it is necessary that its 
partial derivatives vanish there; hence the a's must satisfy the equations 

— ^ = 0 
^cLq 

Differentiation of (12) produces the equations 

- «o - ^ «t*i][-l] = 0 

X2[!/ - Co - ^1*1 «A] [-*i] = 0 


22[2/ - flo - - «A] [-*J = 0 

If these equations are multiplied by the summations performed term 
by term, and the first sum transferred to the right side, these equations 
will assume the form 

ao« + + • • • + = 22^ 

^ + = 2*122 

aoy.x^i + + • • • + 


( 13 ) 



174 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Since = 0 and ^2/ == terms in 

the first equation, except the first term, vanish. This implies that = 0, 
and thus the number of equations to be solved has been reduced by 1. 
The advantage of using variables measured from their sample means to 
simplify the notation and solution of equations like (13) should be clear 
from this result. The problem is now reduced to solving the equations 

+ • • • + 

(-14) H + «ft2*2** = 2*22/ 

«i2***i + «22*)fc*2 + • • • + V = 

Equations such as (13) or (14), which are obtained by the method of least 
squares, are commonly called normal equations. 

These equations are easily solved, provided that the number of equations 
is small. For large sets of equations much time is saved by using one of 
the compact computing schemes available for such problems. The most 
widely known is probably the Doolittle method, to which references are 
given at the end of this chapter. 

The derivation of (14) did not require that all the n values of nor of 
the remaining variables, be different. It is necessary only that there be 
a sufficient number of distinct values of the variables Xg, * • • , to 
determine uniquely the least-squares plane. Ordinarily, this means that 
k + \ distinct values will suffice because a plane in A: + 1 dimensions is 
determined by A: + 1 points, provided that the A: + 1 points do not lie 
in a lower dimensional plane. For example, a plane in three dimensions 
is determined by three points, provided that the three points do not lie 
on a straight line. 

As an illustration of the preceding methods, consider the problem of 
estimating the amount of hay from a knowledge of the spring rainfall 
and temperature, based on the following data. Here Y denotes the amount 
of hay in units of 100 pounds per acre, X^ the spring rainfall in inches, 
and X^ the accumulated temperature above 42°F in the spring. The data 
gave the values 

F = 28.0, = 4.91, Ja = 594 

- = 3.872, - 2^22/ = —149.6, - = —52.36 

n n n 

-2=^2^= 7225 
n n 

The normal equations (14) then become, after multiplying through by n, 
1.21^1 - 52.3602 = 3.872 
52.36ai - 72250, = 149.6 



CORRELATION ANO REGRES^ON T 175 

The solution of these equations is = 3.3 and === -004; conse- 
quently (1 1) becomes 

«/' = 3.3xi + ,004x2 

This result when expressed in terms of the capital letters of forrhula (9) 
yields 

Y' ^9A + 3.3Xi + ,004X2 

This equation indicates that if X 2 is held fixed the amount of iiay Will 
increase about 330 pounds per acre with each inch increase in spring 
rainfall. On the other hand, if spring rainfall is held fixed, the accumu- 
lated spring temperature would have to increase about 250 units, which 
will be observed to be about three standard deviations for variable JTg, in 
order to increase the amount of hay by 100 pounds per acre. fThus it 
appears that the spring temperature is relatively unimportant compared 
with spring rainfall. Such conclusions, of course, are only approximately 
true. They depend on the variables being approximately linearly related, 
and they express only average relationships. They also assume that the 
function in (9) is a satisfactory estimate of a “true” linear regression func- 
tion for those variables. As in the case of linear regression for one inde- 
pendent variable, the problem of the accuracy of the coefficients in the 
least squares regression function as estimates of the coefficients in a 
theoretical regression function is postponed to Chapter 11. 

7.4 Curvilinear Kegression 

• ■ • I; 

If a scatter diagram in the x,y plane indicates that a straight fine will 
not fit a set of points satisfactorily because of the nonlinearity of the 
relationship, it may be possible to find some simple curve that wilf yield a 
satisfactory fit. Since an investigator always strives to explain relation- 
ships as simply as possible, with the restriction that his explanafion be 
consistent with previous knowledge, he will prefer to use a simple type of 
curve. It follows, therefore, that the type of curve to use willidepend 
largely on the amount of theoretical information one has concerning 
the relationship and thereafter on convenience 


7.4.1 Polynomial Regression 

If there are no theoretical reasons for expecting a curve of a certain 
type to represent the relationship, polynomials are usually ^elected 
because of their simplicity and flexibility. The lowest degree polynomial 
that will suffice can often be determined by an inspection of the scatter 



176 


INTRODUCTION TO MATHEMATICAL STATISTICS 


diagram. After the degree has been determined, the best-fitting poly- 
nomial of that degree may then be fitted by the method of least squares. 

Let the degree of the polynomial be k and let the equation of the poly- 
nomial be written in the form 

(15) y ^ aQ + a^x + <72^:^ + • • • + aj,x^^ 

The normal equations here need not be derived because they can be 
obtained frorn the normal equations (13) of multiple regression by using 
the original variables rather than the variables measured from their 
sample means and letting = x\ This is permissible because the 
derivation of equations (13) did not place any restriction on the nature 
of the variables Xj, JVg, • * • , Xj^, and therefore they may be related in 
any manner desired. With this choice of the Z’s, the normal equations 
for polynomial regression become 

ao« + a^x H h a^x^ = '^y 

(\f\ + \-a^a^+^ — Jxy 


+ • • • + a^^x^ = 2 ** 2 / 

As in the case of multiple linear regression, if the number of equations 
is large, the equations should be solved by one of the compact computing 
schemes for such problems. From the discussion following (14), it 
follows that all n points of the scatter diagram for polynomial curve 
fitting need not have distinct x values. It will suffice to have k -f 1 dis- 
tinct X values, since a polynomial of degree k is uniquely determined by 
A: -f 1 points. In evaluating sums such as ^x'^ it is understood that the 
sum is over all the x values and not over just the distinct x values. 

If the investigator is not certain what degree polynomial should be used 
in a given problem and wishes to compare different degree polynomials 
for their adequacy, he would prefer a fitting technique that requires 
little additional labor to increase the degree of the fitted polynomial by 
one unit. Such a technique is available if one uses orthogonal polynomials. 
These polynomials possess the desirable property of leaving unchanged 
the coefficients of the previously fitted polynomial when a higher degree 
term is added. If orthogonal polynomials are not used, the entire set of 
coefficients would have to be recomputed. Orthogonal polynomials are 
particularly convenient when there is but one value of y to each value of x 
and the x values are equally spaced. In the latter case, however, the 
ordinary normal equations will simplify considerably if x is replaced by 
£u — ^ in (15) because then = 0 for m odd. The normal 

equations (16) will then reduce to two sets of equations. Thus, if A: = 5, 
the six normal equations will reduce to two sets of three equations each. 



CORRELATION AN0 REGRBSStON 177 

ji ' 

The odd-numbered equations will involve only the unknowns <22, and 
^4, whereas the even-numbered equations will involve only the unknowns 
^1, «3, and ^5. The technique of how to use orthogonal pplynoniials may 
be found in one of the references at the end of this chapter. ' 


7,4.2 Other Regression Functions 


In the preceding section it was pointed out that when there are no 
theoretical reasons for preferring a certain type of regressioh function 
polynomials are selected because of their simplicity and cohyehience. 
There are numerous situations, however, in which the nature of the 
relationship between two variables is known from theoretical considera- 
tions. In such situations the fundamental regression problem is to ob- 
tain estimates of the parameters that are needed to determine the equation 
of the curve that represents the relationship. For example, the equation 

= constant 

represents the relation between the pressure and volume of an ideal gas 
undergoing adiabatic change. Here y is a parameter whose value depends 
on the particular gas and for which an estimate may be obtained from 
experimental data. 

Another example of a nonpolynomial regression function is tbe func- 
tion often used in studying simple growth phenomena. If it is assumed 
that the rate of growth of a biologicah population is proportional to its 
size, then the regf&s^i^^^^^^ is a simple exponential function. To 

verify this fact, let 1/ denote the size of the population at time t. Tihen the 
assumption concerning the rate of growth can be written in the jform 


where c is the constant of proportionality. This equation is equivalent to 


Integration of both sides will yield 

logy ct + k 

where k is the constant of integration. Letting A: == log h, this eq^ 
simplifies to 


y = 



178 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Suppose, now, that one is given a set of n points (^i, (/g, , 

(/„, y^ representing the size of a growing population at the times /g? 
• • • , If the parameters b and c are to be estimated by least squares, 
it is necessary to minimize the function 


G{b, c) = 2 iUi - 

Calculating the partial derivatives with respect to b and c and equating 
them to 0 will yield the normal equations 


These equations simplify to 


(18) 






The solution of these equations is very difficult and requires tedious 
numerical methods. This example illustrates what frequently occurs, 
namely, that the method of least squares for nonpolynomial regression 
often gives rise to normal equations that are difficult to solve. 

There are numerous other methods of fitting a curve to a set of points 
that can be employed when least squares gives rise to computational 
difficulties. One such method is to introduce new variables that are 
functions of the old variables in an effort to obtain a more tractable rela- 
tionship. Thus, in the preceding illustration, it is convenient to work 
with the variable 7 = log 2/ rather than with y itself. If logarithms, to 
the base e, of both sides of (17) are taken, then (17) becomes 


log y log b + ct 


Then, letting Y = log y and a = log this relationship reduces to the 
linear relationship 


Y ^ ct ct 


The problem has now been reduced to the problem of fitting a straight 
line to a set of points in the t, Y plane and thus to a simple problem in 
least squares. These least squares estimates for c and a may then be used 
to yield estimates for c and Z>. The estimates for c and b obtained in this 
manner differ, of course, from those obtained by solving the original least 
squares equations (18); however, the differences are usually quite small. 

In studying the problem of determining the accuracy of estimates of 
regression parameters, it is essential to know how the errors of estimation 
are distributed. The type of assumption made about their distribution 



CORRELATION AND REGRESSION ir 179 

will often determine whether to use direct least squares or to |use least 
squares on a modification of the relationship. The problem of the accuracy 
of least squares estimates is considered in later chapters; however, it is 
mentioned here to point out that least squares applied to a modification 
of a regression relationship may sometimes be preferred to least squares 
applied to the original relationship and therefore that such a modification 
does not necessarily yield inferior estimates. 

For some types of regression functions it is not possible to introduce 
changes of variables that will reduce the problem to one for Avhich the 
least squares equations become tractable. For example, in studying 
growth phenomena of a somewhat more complex nature than those 
implied by (17), the modified exponential function «/ = a + is often 
used as a regression model. Taking logarithms will not help here; because 
of the parameter a. For functions such as this, other fitting procedures 
are often used. The simplest procedure is to select three points which 
appear to represent the trend of the data-points and pass the curve; through 
those points. The three equations resulting from having the coordinates 
of the points satisfy the equation would suffice to determine tlfie three 
parameters. There are other more refined methods that could;, also be 
used here. 


7.5 Linear Discriiniiiant Functions 

A problem that arises quite often in certain branches of science is that 
of discriminating between two groups of individuals or objects on the 
basis of several properties of those individuals or objects. For dKample, 
a botanist might wish to classify a set of plants, some of which belong to 
one species and the rest to a second species, into their proper spiecies by 
means of three or four measurements taken on each plant. If the two 
species were fairly similar with respect to all those measurenients, it 
might not be possible to classify the plants correctly by means" of any 
single measurement because of a fairly large amount of overlap in the 
distributions of this measurement for the two species; however, it might 
be possible to find a linear combination of those various measurements 
whose distributions for the two species would possess very little overlap. 
This linear combination could theii be used to yield a type of index 
number by means of which plants of two species could be differentiated 
with a high percentage of success. The procedure for discrinjiiinating 
would consist in finding a critical value of the index such that aiiy plant 
whose index value fell below the critical value would be classifie(| as be- 
longing to one species, otherwise to the other species. 



18 a 


INTRODUCTION TO MATHEMATICAL STATISTICS 


The principal difference between a linear discrimination function and 
an ordinary linear regression function arises from the nature of the 
dependent variable. A linear regression function uses values of the 
dependent variable to determine a linear function that will estimate 
the values of the dependent variable, whereas the discriminant function 
possesses no such values or variable but uses instead a two-way classifi- 
cation of the data to determine the linear function. 

Consider a set of k variables - ^ by means of which it is 
desired to discriminate between two groups of individuals. Let 

(19) Z = ^^Xj^ + ^2^2 + * * • -f* 

represent a linear combination of these variables. The problem then is 
to determine the A’s by means of some criterion that wifi enable z to 
serve as an index for differentiating between members of the two groups. 
For the purpose of simplifying the geometrical discussion of the problem, 
consider two variables with and n^ individuals, respectively, in the 
two groups. The equation 

Z -f- ^2^2 

then represents a plane in three dimensions passing through the origin 
and having direction numbers Ag, and —1. If the two sets of points 



Fig, 6. Example of a discriminating plane. 



CORRELATION AND REGRESSICm :■ 181 

corresponding to the two groups of individuals are such that they can be 
separated by means of a plane through the origin, as shown in Fig. 6, it 
is clear that the values of 2 ; corresponding to the two groups will assume 
increasingly large negative and positive values as the separatihg plane 
approaches perpendicularity to the plane. At the same time, how- 
ever, the variations of the values of within a group becomes increasingly 
large for both groups; consequently the increase in the separation of the 
values of 2 ; for the two groups occurs at the expense of an increaise in the 
separation of the values of 2 : within each group. This situatidh corre- 
sponds to that in which the means of two distributions are separating but 
for which the standard deviations are increasing to such an extent that 
greater discrimination between the two distributions does not necessarily 
result. It would be desirable, therefore, to choose a plane that separates 
the values of z for the two groups as widely as possible relative to tfie varia- 
tion of the values of z within the two groups. As a measure of the separa- 
tion of the two groups, it is convenient to use (% — where and 
are the means of the two groups. As a measure of the variatidn of the 

2 ni I' 

values of z within the two groups, it is convenient to use 2 2 ^ 

. i=:i 

Here % denotes the z value of the yth individual in the /th grou|), where 
/ = 1 or 2. Then the desired plane will be that plane for which the A’s are 
determined to maximize the function^ 


( 20 ) 


2 

2 

= 1 


(h - -^2? 


Although the arguments leading to (20) were elucidated by means of 
two variables and three-dimensional geometry, they hold equally well for 
k variables; consequently the solution of the problem will be carried out 
for the general case. 

Let represent the value of for the yth individual in the dh group 
and let represent the mean value of x^ for the individuals in that 
group. Then from (19) it follows that 


(^0 ^2 ~ ^1(^11 “* ^12) + ‘ * * + ^ kip^kl ““ % 2 )> 


and 


(22) z^^ - + • ■ • + K{^ki5 - 

If rfj, = ^ 3,1 — ^3,2? follows from (21) that 

{\ — + • • • + 

= 22 

3)==1 <7 = 1 



182 


INTRODUCTION TO MATHEMATICAL STATISTICS 


If = 2 2 ~ * 3 >i)(*aij “ *«<)> it follows from (22) that 

i=lj=l 

2 rii 2 rii 

2 2 ““ ~ 2 2 ““ + * * * + 

^=1 5=1 i=l 5=1 

2 rii k k 

= 2 2 2 2 - a;«i) 

i=l5=l35 = lQ' = l 

k k 2 rii 

~ 2 2 2 2 ^3)t)(^«i5 

3J = 1 « = 1 ^ = 1 5‘ = 1 

= 22 KK^pq 

2) = lC? = l 

When these values are inserted in (20), it reduces to 


(23) 


2 2 . 

Q ^ P=1 Q = 1 ^ 

k k n 

2 2 VAa 

a? = l Q = 1 


Since the 2’s are to be determined to make G a maximum, it is necessary 
that dGjdX^ = 0 for r = 1, at the maximizing point. This require- 

ment may be expressed in the form 

dG dx, dx, ^ , , 

^ = - = 0, r=l, 

9A, 


which is equivalent to 
(24) 


^ = 1M r = 1 

9A, GdX’ ^ 


, k 


For ease of differentiating, it is convenient to write out B in the form 
B = AjAiSj! + • • • + + • • • + X^X^S^^. 


+ ■ ■ ■ + XfX^„ + • • • + 


+ • • • + hKSjcr + • • • + X^X^Sjch 

It will be observed that A, occurs as a common factor of both the rth row 
and the rth column. Since Sfj = it therefore follows that 



CORRELATION XND REG1®K!^ * *; . - jg ^ 

Similarly, gj 

^^ = WA + --- + W.) 

^ 2(Aidi + • • • + ^jcdjc)ciy 

If these expressions are imserted in (24), it will reduce to 
(25) + ^2>^r2 + * ’ ' + ^fc^rfc = cdy., r =1, ’ ‘ ^ j 

where c =z [X^d^ + • • • + ^M/G is independent of r, ! 


Since 

, 



2. rii 


(26) 

^:pq “ 2 

i=li=l 


and 


(27) 

II 

— ^ 3,2 


are numerical quantities in any given problem, the necessary conditions 
(25) constitute a set of k linear equations in the 2*s. The solution of these 
equations determines the 2’s except for the unknown factor c, ^Llthough 
c is actually a function of the 2’s here, so that the solution of these equa- 
tions expresses each Xi as a constant times this function, the factor c 


Table 4 


Racey4 

Xi 

6.36 

5.92 

5.92 

6.44 6.40 6.56 

6.64 6.68 6:72 6.76 6.72 ' ' 


5.24 

5.12 

5.36 

5.64 5.16 5.56 

5.36 4.96 5.48 5.60 5.08 

D fi 


6.00 

5.60 

5.64 

5.76 5.96 5.72 

664 5:44 5.04 4:56 5.48 5:76 

JVClV'C JO 

^2 

4.88 

4.64 

4.96 

4.80 5.08 5.04 

4.96 4.88 4.44 4.04 4:.20 4.80 


cancels out from numerator and denominator of G when these values are 
substituted in (23). Thus there is no unique set of A’s maximizin| (7, and 
any multiple of a set of A’s satisfying equations (25) will do just as well. 
From (19) it is clear that such a multiple can be ignored because the two 
sets of 2j’s would merely be multiplied by this constant factor and thus 
would be equivalent as far as discriminating between the two groups is 
concerned. As a matter of fact, it is usually convenient to choose c = 1, 
solve the equations, and then reduce (19) to the form in which one of the 
2’s, say Ai, is unity. 

As an illustration of the use of this function, consider the :data of 
Table 4 on the mean numbers of teeth found on the proximal (a:i) and 
distal (x^ combs of two races of insects. The problem here is; to dis- 
criminate between members of the two races by means of the two indicated 
variables. 



184 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Computations give = 2.68, 8^2 = 1.29, 5*22 = 1.75, di = 0.915, 
rfg = 0,597; consequently if c is chosen equal to 1, (25) becomes 

2.682i + 1.2922 = 0.915 
1.292i + 1.7522 = 0.597 

The solution of these equations is 2^ = 0.274 and 22 = 0.139. If these 
values are used, the linear discriminant function (19) becomes 

z = 0.274a;i + 0.139a;2 

For the purpose of computing values of 2 :, it is more convenient to 
choose c so that either 2^ or 22 equals 1. If c is chosen to make 2^ equal 1, 
this discriminant function reduces to 

2 : = + 0.507ir2 

The values of z corresponding to the various members of the two races 
given in Table 4 are as follows : 

RacQA 9.02 8.52 8.64 9.30 9.02 9.38 9.36 9.19 9.50 9.60 9.30 

Race 5 8,47 7.95 8.15 8.19 8.54 8.28 8.15 7.91 7.29 6.61 7.61 8.19 


It will be noted that the two races are segregated by means of z except 
for the slight overlap found in the second entry for Race A and the fifth 
entry for Race B. 

As presented in this section, a linear discriminant function is construc- 
ted for the purpose of classifying future observations into their proper 
group. Thus the problem is essentially one of estimating the 2’s. This 
function could be used as a device for testing the hypothesis that the two 
groups differ in the manner described earlier; however, there are other 
more natural methods for treating the latter problem. 


REFERENCES 

Additional material on correlation and in particular on the z transformation may be 
found in R. A. Fisher, Statistical Methods for Research Workers, Oliver and Boyd. 

The Doolittle method for solving a set of linear equations may be found in the pre- 
ceding reference or in Croxton and Cowden, Applied General Statistics, Prentice-Hall. 

The technique of orthogonal polynomials is explained in the first reference above or in 
Anderson and Bancroft, Statistical Theory in Research, McGraw-Hill Book Co. 



CORRELATION AND REGRESSION ' 185 

EXERCISES 

1 . Calculate the value of r for the following data on the heights (x) ancj weights 
(2/) of 12 college students. : 


X 

63 

72 

70 

68 

66 

69 

74 70 

63 

72 6; 

5 "'71 

y 

124 

184 

161 

164 

140 

154 

210 164 

126 

172 d 

;3 150 


2. What interpretation would you make if told that the correlatibn! between 
the number of automobile accidents per year and the age of the driver is r 
—.60 if only drivers with at least 1 accident are considered? 

3. What explanation would you give if told that the correlation ^ between 
fertilizer added and profit made in raising vegetables on a certain experimental 
farm was only .20? 

4. What would be the effect on the value of r for the correlation; between 
height and weight of males of all ages if only males in the 20-25 age group were 
sampled ? Observe what effect this restriction would have on the scatter diagram. 

5. Explain why it would not be surprising to find a high correlation between 
the density of traffic on Wall Street and the height of the tide in Maine if observa- 
tions were taken every hour from 6 :00 a.m. to 10:00 p.m. and high tide pccurred 
at 8 :00 a.m. Plot a scatter diagram to assist in the explanation. i 

6. How large a correlation coefficient is needed for a sample of size 2lS before 
one is justified in claiming that /> 5*^ 0? 

7. Test the hypothesis that p = .7 if a sample of 50 gave r = .6. : ' 

8. Prove that == where « and v are the integral variables introduced in 

7.1.2. j: 

9. For the data of problem 1 , find the equation of the regression line df y on oj. 

10. Derive the least-squares equations for fitting a curve of the type y = 
aa: -f to a set of w points. 

11. Derive the least-squares equations for fitting t/ = to a set of n 

points. 

12. The following data are for tensile strength (lOOlb/in.^ and hardness 
(Rockwell E) of die-cast aluminum. Find the equation of the regression line 
with y chosen as tensile strength. 


Tensile strength (2/) 

293 

349 

368 

301 

340 308 

354 

313' '312: ■ 

33T 

Hardness (x) 

53 

70 

84 

55 

78 64 

71 




Tensile strength (2/) 

377 

247 

348 

298 

287 ‘292 

345 

0 

00 

258 

Hardness {x) 

70 

56 

86 

60 

72 51 

88 

95 5I 

75 











186 INTRODUCTION TO MATHEMATICAL STATISTICS 


Tensile strength (y) 

265 

281 

246 

258 

237 

286 

324 

282 

340 

Hardness (x) 

54 

78 

52 

69 

54 

64 

83 

56 

70 


13. Show that the equation of the simple regression line can be written in the 
form y" - y = r(syls^)(x - £), 

14. Use the formula in problem 13 to show that the correlation coefficient 
between the regression line values y/ and the observed values yi is equal 
to r. 

15. The following data are for the 3 variables honor points (T), general 
intelligence scores (A\), and hours of study (Xg). Find the equation of the 
regression plane of y on x^ and x^, given 

= '^x^x 2=33, ^00^^ =36, ^x^y = 106, '^x^y = 11 

16. Prove that s^_y = + Sy^ — Irs^^Sy. 

17. Find an expression for for the first n positive integers by using a familiar 
formula for the sum of the squares of those integers. 

18. If X and y denote the ranks of an individual with respect to two characters 

for a group of n individuals, derive the formula r = 1 — 6 — y)^ln(n^ — 1) 

for the correlation of the 2 ranked variables by using the results in problems 16 
and 17. Calculate r by means of this formula for the data of Table 1 and compare 
with the regular r value of .66. Replace tied ranks by the mean of the ranks 
involved. 

19. The following data are for intelligence-test scores, grade-point averages, 
and reading-rates of students. 

(a) Calculate r between I.T. scores and G.P.A. 

(b) Find the equation of the regression line of G.P.A. on I.T. scores, 

(c) Find the equation of the regression plane of G.P.A. on I.T. scores and R.R. 

(d) By comparing errors of estimation, determine whether (c) is considerably 
better than (b) for estimating G.P.A. 


I.T. 

295 

152 

214 

171 

131 

178 

225 

141 

116 

173 

G.P.A. 

2.4 

.6 

.2 

0 

1 

.6 

1 

.4 


2.6 

R.R. 

41 

18 

45 

29 

28 

38 

25 

26 

22 

37 


I.T. 

230 

195 

174 

177 

210 

236 

198 

217 

143 

186 

G.P.A. 

2.6 

0 

1.8 

0 

.4 

1.8 

.8 

1 

.2 

2.8 

R.R. 

39 

38 

24 

32 

26 

29 

34 

38 

40 

27 











CORRELATION AND REGRESSION 


187 



135 146 227 204 223 142 176 

G.P.A. 1.4 1.2 1.4 1.4 1.4 .8 .8 2.6 2.6 .2 

R.R. 26 19 35 26 18 22 23 27 40" 33 


20. The following data give the velocity of the Mississippi River iit feet per 
second corresponding to various depths expressed in terms of the ratio Z) of the 
measured depth to the depth of the river, (a) Fit a parabola V = a + cD^ 

to the data, choosing a convenient origin. (6) Find V when D = .9 (observed 
V = 2.976). (c) When would you consider extrapolation as used in (6) a valid 
procedure? 


D 

V 



3.195 3.230 3.253 3:261 3.252 3.228 3.181 3.127 3.^9 


21. The following data are for a growing plant, (a) Plot the data on prdinary, 
semiiog, and log-log paper and verify that it is most nearly linear on semilog 
paper, (b) Fit a simple exponential to the data by first taking logarithms the 
exponential equation. 


Day 0 1 2 3 4 5 6 7 ^8 

Height .75 1.20 1.75 2.50 3.45 4.70 6.20 ' 8.25“ itSO 


22. The pressure of a gas and its volume are related by an equation of the 
form pv^ == b. In a certain experiment the following values were obtained. 
Determine a and b hy least squares on the logarithmic equation. 


p 

.5 

1 

1.5 

2 

2.5 

3 

V 

1.62 

I 

.75 

.62 

.52 

.46 


23. Suppose that the exponential 2/ = is the proper curve to fit to a set of 
points. If the parameters a and b are determined by least squares applied to the 
logarithm of this equation and also by least squares directly, which itiethod is 
more likely to be heavily influenced by a point with an unusually large value of .t ? 















188 


INTRODUCTION TO MATHEMATICAL STATISTICS 


24. Derive the least-squares equations for fitting a modified exponential y == 
c + to a set of n points and indicate why these equations would be difficult 
to solve. 

25. Explain how the least-squares equations for multiple linear regression are 
also applicable if second-degree terms in addition to the first-degree terms in the 
k variables are introduced in the regression equation. 

26. Derive the equations that would need to be solved if one were to estimate 

and b in the equation of the regression line by requiring that the sum of the 

squares of the perpendicular distances to the regression line be a minimum. 

27. Classify the individuals of problem 19 into 1 of 2 groups on the basis of 
having a G.P.A. less than or greater than .9. (a) Using the remaining variables, 
find the equation of the discriminant function for classifying individuals into the 
proper G.P.A, group, (b) Calculate the values of z for the individuals and note 
whether the discriminant function does appreciably better than either variable 
alone. 

28. Two polynomials P^(cc) and P/jp) of degrees / and /, respectively, are said 

n 

to be orthogonal on a set of points provided that 2 = 

k = \ 

n 

0, / 7 ^ j. A polynomial PJ^x) is said to be normalized on the set if 2 1 • 

h-l 

For the points x = 0, 1, 2, 3, 4, find an orthogonal normalized (orthonormal) set 
of polynomials Po(i*^), Pxi^), 

29. Assuming the properties defined in problem 28, obtain the least-squares 
equations for the coefficients of the polynomial y = a^P^ix) -h aiP^ix) + • • • 
+ QjcPjci^) and show that their solution for a particular coefficient is the same 
regardless of the degree of the polynomial, provided, of course, that i < k. 

30. For linear regression involving more than 2 variables, the multiple cor- 
relation coefficient is defined as the correlation between the observed values yi 
and their estimates t//. It is designed to measure the extent to which the linear 
regression function is capable of predicting the dependent variable y. Calculate 
the value of the multiple correlation for the data of problem 19 if the answer to 
19(c) is 2 /' = .0\lx^ — .OOTcPg — .97. 

31. The partial correlation coefficient between variables x^ and x^ is defined as 
the correlation between the values x^ — a?/ and x^ — cc/, where x^ is the re- 
gression value of x^ on all variables except and x/ is the regression value of x^ 
on all variables except x^. It is designed to measure the correlation between the 
variables x^- and Xj when the effects of the remaining variables have been elimi- 
nated. Calculate the partial correlation between G.P.A. and R.R. for problem 1 9, 
if the answer to 19(^) is 2 /' = .0116a;i — 1,11. It will first be necessary to work 
mb) for R.R, and I.T. 



C H A P T S R 


Theoretical Frequency Distributions Foir 
Correlation and Regression 


Frequency functions of two variables defined in Chapter 2 for 
both discrete and continuous variables. Although a number of their 
properties were discussed there, it is now necessary to consider addi- 
tional properties if mathematical models for empirical frequency dis- 
tributions of two variables, such as those encountered in Chapter ;? under 
correlation, are to be constructed. These properties will turn Qi^t to be 
essential in the construction of regression models as well. 


8.1 Continuous Distributions of Two Variables 

Since correlation and regression as defined in the preceding chapter 
involve pairs of continuous variables, it will be necessary to study prop- 
erties of joint frequency functions for such variables. In this connection, 
it will be found that theoretical moments are particularly useful for a 
theory of correlation, whereas the notions of marginal and conditional 
distributions are needed for a theory of regression. These twp types 
of distributions have already been defined for discrete variables in 
Chapter 2. ;■ 

The geometrical representation oif {x,y) as a surface in three : dimen- 
sions as displayed in Fig. 12, Chapter 2, is convenient for interpreting 
probability as a volume under the surface; however, in discussing Correla- 
tion, and marginal and conditional distributions, it is more convenient to 
think oi f{x, y) as giving the density distribution of probability mass over 
the plane, with the total mass being equal to 1. This was easy to do in 
2.12 for discrete variables because only a finite number of mass points 
was involved. Here, however, it is necessary to conceive of a continuous 
distribution of mass such as in a sheet of metal. The density of thp metal 

189 



190 


INTRODUCTION TO MATHEMATICAL STATISTICS 



1 



Fig. 1. Probability density distribution. 


X 


sheet at a point (x, y) is given by f{x, y) and the mass of the entire sheet 
is equal to 1. Figure 1 attempts to portray this density interpretation for 
the frequency function that is graphed as a surface in Fig. 12, Chapter 2. 

From the density point of view, the probability that a single sample 
will yield a point {x, y) lying in a given rectangle is equal to the mass of 
the rectangle. This interpretation of probability, as well as the volume 
interpretation, clearly holds for regions other than rectangles in the x,y 
plane. 


8.1.1 Marginal Distributions 

Models for regression will be discussed before those for correlation 
because correlation models require some of the material needed for regres- 
sion. A general theoretical regression curve can be defined by means of a 
conditional distribution. It in turn can be defined by means of a marginal 
distribution; therefore consider such distributions for continuous vari- 
ables next. 

For the purpose of obtaining a formula for continuous variables 
corresponding to (28), Chapter 2, let f{x, y) be the joint frequency func- 
tion of any two continuous random variables and consider the following 
inequalities. 

P{a < a; < /?} = P{a <x<p,--QO<y<(^] 



^00 

l{x) = 

rr\ 


where, as indicated, 

( 1 ) 


/(*, y) dy 



THEjOKETlCAtym^ 191 

Now, if X is considered independently of y, then by definition 

P{a < < j3} = f f(x) dx 

Ja 

where f(x) is the frequency function of x alone. If these two expressions 
for P{a < X are equated, 



Since this equality is to hold for all intervals (a, jS), a may be held fixed 
and allowed to vary, in which event these integrals may be treated as 
functions of By the well-known calculus formula that has been used 
before, if 

Fm^Cf{x)dx 

Ja 

then 

dp 

If both sides of (2) are differentiated with respect to /?, this formjula will 
give 

Since this is an identity in /S, it follows that the function h(x) deftned by 
(1) is the frequency function f{x). These arguments therefore show that 

the marginal frequency function f{x) is given by the following formula. 

' I- ■ ^ 

(3) Marginal DisTRiBUTiONr /(a;) = f f{x,y) dy 

This formula is the continuous analogue of formula (28), Chapter 2, 
for the discrete case. In a similar manner the integration of f(x, y) with 
respect to x from — oo to + oo will yield the y marginal frequency function 
g{y). From the density point of view f{x) may be thought of as giving 
the probability density distribution along the x axis after the entire 
probability mass in the plane has been projected perpendicularly onto 
the X axis. 

As a simple illustration of how formula (3) applies, consider the joint 
frequency function 

, . '2 - X ^ y, Q <x <\,0<y <\ 

(4) y) 

0 , elsewhere 



192 INTRODUCTION TO MATHEMATICAL STATISTICS 

Here, formula (3) gives 

(5) f(x)^!\2^x^y)dy = ^--x 

•^0 

and 

g(y) = f {2 - X - y)dx ^ - y 

Jo 

As a second illustration, consider the following frequency function: 

^ 0<x<2,0<y<x 

y) = . 

0 , elsewhere 

In this problem the sample space is the triangle bounded by the lines 
X — 2, y == X, and 2 / = 0. Although the limits in formula (3) are written 
— 00 and 00 , this is merely for notational convenience and it is understood 
that when the limits are not infinite one must determine the limits from 
the sample space boundaries. The limits of integration in this problem 
certainly depend on the chosen value of x. Formula (3) gives 

/(^) = ^ydy = ^, 0 <x <2 

Jo 4 

Similarly, 

g(y) = f xydx^ ly{A - y% 0 <y <2 


^.1.2 Conditional Distributions 


Now that marginal distributions have been determined, it is possible 
to proceed with the problem of defining conditional distributions for 
continuous variables. For the purpose of obtaining a formula for con- 
tinuous variables corresponding to (29), Chapter 2, consider the function 
defined by 

(6) 

m 

If X is held fixed and is such that f(x) > 0, then (6) defines a non-negative 
function of y for which, in view of (3), 


r y) 

J-00 /(*) 



f 


/(«, y) dy 


Thus, according to (31), Chapter 2,f(x, y)lf(x) has properties that enable 
it to serve as a frequency function for y when a? is fixed as indicated. 



THEOkEtICAL FREQUENCY distributions " 193 

Because of this property, /(a;, ^)//(a;) is called the conditional frequency 
function of y for fixed x and is denoted by /(^ | x). This definition may 
be expressed as follows. 

n \ Conditional Distribution: f(y I x) = 

m 

By going back to the definition of conditional probability for eVent^ as 
given by (6), Chapter 2, and working with integrals, it is possible tp derive 

(7) directly in a natural manner; however (7) is treated here as a dejjinition. 
Formula (7) is identieal with the corresponding formula for the discrete 
case. As in the case of discrete variables, the conditional distribution of 
2 / as given hy f(y | x) is sometimes called the x array distributioh. The 
conditional frequency function of x for y fixed is defined in a similar 
manner. 

From a density point of view f(y | x) may be thought of as giving the 
probability density distribution along the vertical line in the x,y plane 
corresponding to the fixed value of a;, the total niass of this line beirtg equal 
to 1. The frequency function /(a;, y} a,s it stands could not be used as a 
probability density function along such a line because by (3) it would not 
give a total probability mass of one for the entire line unless f (x) happened 
to be equal to 1. The factor Ijfix) insures that the total mass of the line 
will be 1. 

In the surface representation of f(x,y) the conditional distribution of 
y for x = Xq, say, is represented by a modification of the curve pf inter- 
section of the surface and the plane whose equation is a; = Since the 
area under the curve is ordinarily not equal to I, the ordinates of the curve 
must be multiplied by the proper number to make the area equal ! before 
the curve will be the graph of a frequency function. The proper number, 
of course, is Figure 2 indicates this geometrical interpretation 

for the frequency function given by (4). 

For the problem discussed in (4), the equation for the conditional 
frequency function is obtained by applying (^) to (4) and (5); hefice for 
this problem 

(8) /(y[a;) = ^7^~y 

2 ^ 

For a fixed value of this is a linear function of t/; hence the graph of 
f(y I x) must be a straight line, which of course is obvious from[Fig. 2 
and the geometrical interpretation of f{y | x). It will be observed that the 
only curve of intersection of the type being considered on the surface 
z = f(x, y) that has unit area under it is the one for which a? = |. All 
other curves of intersection must have their ordinates multiplied by 
1/(1 — a:) before they will possess unit area. 



194 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 2. Geometrical representation of a conditional distribution. 


8.1,3 Curve of Regression 


This section is concerned with defining a theoretical regression curve 
that will serve as a model for empirical regression curves. The preceding 
material on marginal and conditional distributions was merely introduc- 
tory for use in this section. A theoretical regression curve is basically 
the graph of the mean of a conditional distribution f{y\x). Here it is 
convenient to use the density interpretation of f{y\ x). Let x have the 
fixed value Then along the line x ^ Xq the mean value of y will deter- 
mine a point whose ordinate is denoted by As different values of 

X are selected, different mean points along the corresponding vertical lines 
will be obtained. Thus the ordinate iJiy\xQ of the mean point for any such 
line is a function of the value of x selected. The locus of such mean points, 
that is, the graph of as a function of x, will be a curve that is called 
the curve of regression of y on x. Analytically, the equation of the curve 
of regression is given by the following formula, 

yfiy I *) dy 

Because of (7), this formula may also be expressed in the form 


( 9 ) 


Curve of Regression: | 


r 



fjx, y) 


dy 


( 10 ) 


THEORETICAL FREQUENCY DlHTRIlBiUTIONiJ 


195 


The curve of regression oi x on y is defined in an analogous fiianner. 

Figure 3 indicates the geometrical nature of the preceding definition of 
the curve of regression for a general density distribution. 

The frequency function given in (4) will be used to illustrate the pre- 
ceding definition. From the result obtained in (8), a direct apfilication 
of (9) gives 


H'v I 




A.. 



f [(2- 




This is the equation of a hyperbola. The graph of this curve of regression 
is shown as the crossed line in Fig. 2. 

The second illustration in 8. 1 .1 will be used here to iliustrate the tech- 
nique of finding the equation of a regression curve when the limits are 
variable. From the results obtained there, it follows that " 


f(y I 


jagy ^ 

-.3 9. 


In view of the triangular nature of the sample space, when a; is [fixed y 
can range over the values from 0 to a; only; consequently (9) becomes 

2 r“ 

X- —Z 

a; Jq 

The fact that the regression curve is a straight line with slope | might 
have been anticipated because of the nature of the density function and 
the sample space. 




196 


INTRODUCTION TO MATHEMATICAL STATISTICS 


8.1.4 Moments 

The type of moments needed for correiation differ slightly from mo- 
ments that have been defined previously. Although only low-order 
moments are required, a general definition is given. These moments are 
known as product moments and are defined as follows. 

(11) Product Moment: = E{x^y^) = j j x^y^f(x, y) dy dx 

V —CO 00 

Here p and q are any non-negative integers. The corresponding product 
moment about the mean is defined by the formula 

( 12 ) 

= -EK* - = f f (* - y) dy dx 

•'-00 •d— 00 

It will be observed that these definitions are special cases of the general 
definition for expected values given in (5), Chapter 6 , in which g{x^, x^^, 

* * • , J is chosen as either the function x^^x^ or ( 0 ?^ — 
and 

The particular product moment which is called the covariance of 
the two variables, is of special interest because the theoretical correlation 
coefficient p between the two variables is defined in terms of it. 

(13) Theoretical Correlation Coefficient: p = 

If (12) is compared with S(a; — x)(y — y)ln in (1), Chapter 7, it will be 
observed that ( 12 ) with /? = 9 = 1 is the theoretical counterpart of this 
sum and that (13) is the theoretical counterpart of r. 

By using formula (3) it is easily seen that the kth moment of a;, say, 
can be obtained from (1 1) by choosing p ^ k and q = 0. Thus it follows 
that 

(14) yMoo' = 1. jMlo' = y-x, i«Ol' = y-y, MstO = i«02 = 

As an illustration of how to calculate the theoretical correlation co- 
efficient, consider the application of formula (13) to the problem first 
considered in (4). By symmetry and (5), it follows that 

Formula (12) yields 



THEORETICAL FREQUENCY DlJiTRIlStJtiONS " 197 

Formula (12) applied to (4) gives 

Formula (13) applied to these results gives 

-1/144 ^ 

^ 11/144 11 

An inspection of Fig. 2 shows that the regression curve has a slight nega- 
tive slope throughout its range and therefore it is not surprising' that p, 
which measures linear correlation, turned out to be negative. 


8.2 Normal Distribution of Two Variables 

■ .■ ■ ■ f' 

The preceding sections have introduced theoretical counterparts of 
empirical regression and correlation and to that extent have presented 
mathematical models for those two statistical quantities. It is not possible, 
however, to work problems of statistical inference with respect to them 
unless one is supplied with a frequency function for doing so.: From 
the point of view of correlation the frequency function must be such that 
the density distribution of points in thQ x,y plane will indicate a linear 
type relation between x and y because the correlation coefficient i$ useful 
as a measure of relationship only when the relationship is approximately 
linear. This places a considerable restriction on the type of frequency 
function that can be selected as a model. Unless one wants a mqdel for 
linear regression only, there is no such restriction on the frequency function 
necessary for regression models. 

Now, since the normal frequency function has been shown to be a 
useful mathematical model for distributions of a single con|tinuous 
variable, it is to be expected that a joint normal frequency function for 
two continuous variables will also prove to be a useful model. If two 
random variables x and y are normally distributed but in addition are 
independently distributed, then their joint frequency function is easily 



198 


INTRODUCTION TO MATHEMATICAL STATISTICS 


written down because, from (24), Chapter 2, the joint frequency function 
is then the product of the two marginal frequency functions. In this case, 
therefore, 


( 15 ) 


fi?», y) = 


^ 2\ ; 


V'27r(T* 


g c, ) 


‘s/l-rra^ 




iTTO^Cy 


If the variables x and y are not independently distributed, it is necessary 
to modify (15) to take account of the relationship between x and y. This 
is done by introducing a cross-product term in the exponent of (15) which 
is such that its coefficient will be equal to 0 when a? and y are independent. 
The desired modification is accomplished by means of the following 
definition. 


(16) Definition: The normal frequency function of two variables is 
given by the following formula, where — 1 < p < 1, 


/(*, y) = 


l-na^a^'s / 1 - p® 


If the same approach had been used here as for a normal distribution 
of one variable, one would have defined a joint normal frequency func- 
tion as an exponential function of two variables in which the exponent is 
a quadratic function of those variables, and then one would have pro- 
ceeded to show that the parameters defining the function can be expressed 
in terms of familiar statistical parameters. The result of such an approach 
is the expression given in (16). As a consequence, the function defined in 
(16) possesses the properties of a joint frequency function and its param- 
eters are consistent with the general moment properties given in (14) and 
(13). This implies, for example, that the parameter p in (16) is actually 
the theoretical correlation coefficient here, as defined in (13). These 
facts can be verified by evaluating the necessary integrals. 


8.2.1 Marginal Distribution 

The marginal distributions of a joint normal distribution are obtained 
by applying formula (3), and its y version, to (16). For example, the x 
marginal frequency function is given by 


fi^) = f /(*, y) dy 


(17) 



THEORETICAL FREQUENCE BimiTOlONS " i; 


where /(a;, y) is given in (16). In order to simplify the integraiion, let 
u ^ {x -- ix^ja^ and introduce the change of variable t; =: (2/ 

Then dy = ay dv and (17) reduces to 



1 

lira^V 1 — P‘ 






Adding and subtracting to the exponent in order to complete the 
square in gives 


m 


1 

2rra ^^/ 1 - 


27ra^V 1 — 



~ 9/1 ^ + pH^—pV 

e 2(1 ~p^) fj[y 


^ 2(1 -p2) 






Now make the change of variable z = (v pu)lV 1 — Then dv = 
a/ 1 — (fe and /(a;) reduces to 



cfe 


Substituting back the value of u in terms of a; and inserting the value 
V 2tt for this familiar integral, /(x) finally reduces to ' 


(18) 


/(^) 





V2Tr(T^ 


Since the corresponding result for y follows from symmetrjy, (18) 
shows that the marginal distributions of a joint normal distribution are 
normal. This result was to be expected, because one would certainly 
have been unhappy with the definition of a joint norma distribution if 
the individual variables had not been normally distributed. | 

The result obtained in (18) is very convenient for demonstrating the 
consistency of definition (16) with several of the general moment pro- 
perties given in (14), For example, in order to demonstrate tjiat the 
constant in (16) has been properly chosen, it is necessary to show tliat the 
volume under the surface whose equation is given by (16) is equil to 1. 
Hence it is necessary to evaluate the double integral. 



200 


INTRODUCTION TO MATHEMATICAL STATISTICS 


where f{x, y) is given by (16). But from (17) the result of integrating with 
respect to y is given by (18); therefore, the evaluation of (19) is reduced 
to the integration of (18) with respect to x over all values of x. The value 
of this integral, of course, is 1 , 

If one sets p = 0 in (16), it will be observed that (16) reduces to (15), 
which is the frequency function of two independent normal variables. 
This shows that if two normal varialDles are uncprrela they are in- 
dependently distributed. From the discussion of correlation given in 
section 7.1, particularly with respect to diagram {e) of Fig. 3, in Chapter 
7, it should be clear that a lack of linear correlation does not ordinarily 
guarantee a lack of relationship of every kind between the two variables. 


8.2.2 Conditional Distribution 

A joint normal distribution of two variables possesses conditional 
distributions with interesting properties. In order to study these prop- 
erties, it will suifice to examine the conditional frequency function 

I *)• 

For ease of writing, let u = (x — and v — (y — Then, 

a direct application of definition (7) to (16) and (18), together with a few 
algebraic reductions, will give 

{,v^—2puv + pV) 


'VTmCy'K/ 1 — 

_1 f V — pU \ 2 
e 2\Vi-p2/ 

'VlTTCXy'Vl — 


If the values of u and v in terms of x and y are inserted and if the value 
of y is denoted by y^ to show its dependence on the selected value of 
will reduce to 


(20) 


f(y I *) = 


VlTTayVl — p^ 


Since x has a fixed value and y^ is the random variable here, (20) shows 
that y^. possesses a normal distribution with mean [jiy -1- piayja^^x — 



THEORETICAL FREQUENCY 201 

and standard deviation (T^V 1 — By symmetry a similar result holds 
for X and y interchanged. Thus the conditional distributions of a joint 
normal distribution are also normal. 

Since by definition (9) a curve of regression is the locus of thje means 
of a conditional distribution, it follows from (20) that the curve of regres- 
sion of y on a: for x and y jointly normally distributed is the straight line 
whose equation is 

(21) jMy I JB “b P ' /^a;) 

This property of a joint normal distribution, namely, that the curve 
of regression of y on x k a straight line, helps to justify the frequent use 
of linear regression because variables that are approximately normally 
distributed are encountered frequently. 


8.2.3 Normal Surface 


Instead of thinking in terms of probability density in the plane, con- 
sider now the geometry of (16), treating it as the equation of a surface in 
three dimensions. If (7) and the particular results (18) and (20) are 
applied, the equation of this surface may be written 


( 22 ) 


/(^) 


~^L OyVi-pa J 

V IrrayV 1 — 


For the purpose of studying this surface consider its inters^^ctions 
with planes perpendicular to tho x axis. The equations of the intersecting 
curves are obtained by replacing x with the constant values corresponding 
to the cutting planes. From (22) it will be observed that these curves are 
normal curves, although not the graphs of normal frequency functions 
because the area under any such curve is not usually equal to one, with 
their means lying on the regression line (21), all having the same standard 
deviation CyV 1 — and varying in maximum height according to the 
factor f{x). The tallest such normal curve is the one lying in the cutting 
plane x == since this value makes f(x) a maximum. By symmetry, 
planes perpendicular to the y axis will intersect the surface in normal 
curves with corresponding properties. A sketch of a normal correla- 
tion surface which shows these various geometrical properties is given in 

Fig- 4. ; ... p. ^ 

Further information is obtained by considering the intersection of the 
surface by planes perpendicular to the z axis. In this connection it is 



202 


INTRODUCTION TO MATHEMATICAL STATISTICS 



more convenient to use the original form (16) with /(a?, y) replaced by z. 
If z assumes different constant values, the quantity in brackets in the 
exponent will assume corresponding values that can be calculated from 
the constant values assigned to 2 :. Hence the equations of such intersect- 
ing curves may be written in the form 


where k corresponds to the selected value of z. Since this is a quadratic 
function in x and y, these curves of intersection must be conic sections. 
Furthermore, since the type of conic section depends only on the quad- 
ratic terms, the discriminant for testing conic sections may be applied 
directly to give 





4 (/°^ - 1 ) 

.r 2 ^ 2 


<0 


This result shows that the intersecting curves are ellipses, because by 
definition (16) < 1. Allowing k to assume different values will merely 

change the sizes of these ellipses; consequently, these ellipses have the 
same centers and the same orientation of principal axes. It will be found, 
when rotating axes properly to eliminate the x,y term, that the principal 



THEORETICAL FREQtJi£NCV t)ISfttl^ ' 

axis of these ellipses is not parallel to a line of regression as hlight be 
supposed. The line of regression turns out to be parallel to the diameter 
of the ellipses obtained by considering chords parallel to the y axiis. 


8.3 Normal Correlation 


The normal frequency function defined in (16) appears to be a satis- 
factory model for correlation problems because it yields a probability 
density distribution in the plane for which the regression i$ linear 
and because it possesses a parameter that measures the theoretical cor- 
relation present. In addition, experience shows that many pairs bf real- 
life variables possess distributions of approximately this type. 

In Chapter 7 the sample correlation coefficient r was introducfed as a 
measure of the degree to which two variables are linearly related. ; It was 
stated there that the justification for choosing r as the preferred rneasure 
rested upon the fact that it is the maximum likelihood estimatoif: of the 
theoretical correlation coefficient p when the two variables polssess a 
joint normal distribution. This property of r will now be verified. 

Let (ajj, • • • , yj represent a random sample of size n 

from a normal population whose frequency function is given by (16). 
The likelihood function for this sample is 

Vi) 

i=\ 








For ease of differentiating, the logarithms of both sides are taken., Then 
log L = — w log lira^ayV 1 — 


2(1- p^)- 


Ll cr. I (T, il a, )^\ a, li 


In order to find the maximum likelihood estimators of the parameters, 
it is necessary to differentiate log L with respect to py, cr^, <r^, and p, 
and then to solve the five equations that are obtained by setting these 
five derivatives equal to 0. It will be found on solving the first two bf these 


equations that the maximum likelihood estimators for and /^j, are, -as was 
to be expected, x and y. It is not possible to solve the third and jfourth 
equations alone for the estimators for and ay because they involve p; 
therefore, the remaining three equations need to be solved simultaneously. 



204 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Diflferentiating log L with respect to gives 


1 , 2pi;(a;^ - 


dlogL ^ _n__ 1 ~ ~ 

da^ o* 2(1 - p®) L J 

A similar formula results from differentiating with respect to 0 *^. Setting 
these two derivatives equal to 0 will yield the equations 


_ pli^i - /»a;)(y»- - l^v) 


S(yi - l^yf _ pli^i - i«a;)(y< " 


+ n(l - P^) 


+ «(1 - P^) 


If now jji^ and fXy are replaced by x and y, which are the solutions of the 
first two of the maximum likelihood equations, and if the notation 
“ {Vi — y) = nrSySy is used, these equations will simplify to 


= pr Mv. ^ pi) 


= pr^ + (l-p^) 


This shows that 


qnd therefore, substituting into the first equation, that 

£»= /lEZ 

^ 1 - pr 

The fifth maximum likelihood equation is obtained by differentiating 
log L with respect to p. This differentiation, followed by some algebraic 
simplification, yields the result 

0jog_L^ np 1 y -i»» \ lyiZJh\ 

dp 1 — 1 — p® \ / \ O’® / 


- 2 [(^-J - (^) . 

Setting this derivative equal to 0 and performing some simplifications 
yields the equation 

«p(i - p^) + (1 + p*) 2 (^^-) 



Vi - ' 

\ Ox ' V 

ay /. 



THEORETICAL FREQUENCY DISTRIBUTIONS 
Now, replacing (i^ and by * and y and sja^ and sja^ by 

V(1 - p2)/(l - pr), 

this equation reduces to 

p(l - p^) + r(l + p®) I ^-Ip - — -£- 

1 — pr 1 — pr 

Since 1 — p^ may be factored out, this equation is easily seen tp possess 
the solution p r; consequently this proves that the maximum likeli- 
hood estimator of p when x and y are jointly normally distributed is the 
sample correlation coefficient r. 

Incidentally, when p is replaced by r in the expression for it will 
be observed that the maximum likelihood estimator of reduces to 
Thus the joint maximum likelihood estimators of fiy, or^, (tj,, and p 
are it, 5^, i'j,, and r. 


8.4 Normal Regression 

In Chapter 7 the study of the relationship between two or more variables 
was considered from two points of view, namely, that of correlation and 
that of regression. Correlation methods were considered appropriate 
when interest is centered on measuring the degree to which two variables 
are linearly related and when both variables are randomly sampled. The 
theory presented thus far in this chapter has been principally theory for 
correlation because it has been the theory of two correlated random 
variables. Some of this theory, however, is also useful in the qpiistruc- 
tion of mathematical models for regression. Such a model is considered 

In all the regression problems of Chapter 7 the independent Variables 
were considered to be fixed so that y was the only random variable present. 
For example, in the illustration of linear regression given in Table 2, 
Chapter 7, the values of x were selected by the experimenter to be equally 
spaced over the range of x values of interest to him. Repeated experi- 
ments of this type would require that the experimenter use the same x 
values each time. It is clear that the joint distribution of two variables is 
not heeded for a regression problem such as this. 

Although the joint distribution of a; and y is not needed for regression, 
the conditional distribution of y for x fixed is needed if the Accuracy 
of the least-squares estimates of the regression coefficients obtained in 
Chapter 7 is to be determined. In the notation of (7) this means that the 
conditional frequency function /(^ | x) must be known for all ^e fixed 
X values. In the case of multiple regression a; Will be undersitood to 
represent all the independent variables. 



206 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Consider, first, the problem of deciding what type of conditional 
frequency function f(y x) would make a satisfactory model for simple 
linear regression of y on x. Although the x values are fixed in regression, 
X usually possesses a continuous distribution under random sampling. 
If X and y possess a joint normal distribution, the regression curve will be 
a straight line and f{y | x) will be given by (20). For two variables that 
are approximately normally distributed one would therefore choose 
f{y\x) to be a function having the properties of the function given by 
(20). Hence f(:y\x) is chosen to be a normal frequency function with its 
mean, as a function of x, lying on a straight line and with its variance 
independent of x. 

Since the x's are to be fixed, the conditional distribution of y for fixed 
X is needed only for the ^’s corresponding to the fixed x values. Thus, 
denoting the set of sample values by (ir^, y^, {x^, y^, • ■ * , {x^, 2/ J, the 
conditional frequency function f{y\x) is needed only for these n pairs of 
values. If the equation of the straight line on which the means of the 
conditional distributions lie is written in the form 

y = a + /8(^ — ^) 

and the variance is denoted by the desired conditional frequency 



Fig. 5. Distribution assumptions for linear regression. 



THEORETICAL FKEQtFENCY DISTRIBUTIONS 
function is then given by 


207 


(23) 


fiVi I *.) == 


's/lna 


A sketch illustrating the preceding assumption concerning the cohditional 
distribution of the 2 ^’s is given in Fig. 5. 

A second assumption is that the random variables 
independently distributed. This assuniptibh is sffisHe^ in 

the regression problem discussed in Section 7.2. It is not satisfied, how- 
ever, for regression problems in which y^.y^, ’ * * ? 2/n represent, say, the 
heights of a growing plant on n consecutive days. 


8.4.1 Estimation of a, and a 

The model selected in the preceding section for simple linear repressidn 
contains the three parameters a, /S, and or. In order to be able to apply 
this model to a given problem, it is necessary to estimate these parameters 
by means of available data. This is done by the method of niaximum 
likelihood. 

Since the randoni variables y^,* * * \ yn were assumed to be in- 
dependently distributed, the likelihobd functioh L for the sample' (a?!, 2 / 1 )^ 
(^ 2 , 2 ^ 2 ). * * ‘ 2 /n) is given by 


n p 

L = U- 

i = l 




\^27ra 




a^lwf 

Taking logarithms and differentiating with respect to a, (8, and oj respec- 
tively, and setting the derivatives equal to 0, one obtains f 

\ ^ [Vi - <!f. - - x)} = 0 ^ 


9a 




\ i ivi - « - - 2] = 0 

op O’ « = l 

- 2 + ;^ i b, - « - A», - *)]* - 0 

GO O' <r 


(24) 



208 INTRODUCTION TO MATHEMATICAL STATISTICS 

Now, a comparison of the first two of these equations with the equations 
of least squares as given by (5), Chapter 7, demonstrates the fact that the 
least'squares estimates of the regression coefficients are precisely the same 
as the estimates obtained by the method of maximum likelihood under 
the stipulated normality and independence assumptions. Since estima- 
tion by the method of least squares does not require the restrictive assump- 
tions made in the maximum likelihood approach, it would seem to be a 
more desirable method for estimating a and jS. However, as soon as one 
tries to determine the accuracy of the estimates or to test hypotheses 
about the parameters being estimated, he will discover that it is necessary 
to make some distribution assumptions about the g/’s. The normality 
and independence assumptions made earlier are assumptions that enable 
one to work such problems, in addition to estimating the regression 
parameters. Least squares alone is capable of estimating the parameters 
only. The problems of determining the accuracy of the estimates and 
testing hypotheses about the parameters being estimated are considered 
in Chapter 11. 

If the estimates of a and obtained from (24) and given by (7), Chapter 
7 are denoted by a and /5, the third equation in (24) will yield the follow- 
ing estimate for cr^: 

= - 2 [^i - a - - x)p 

ni=i 

A mathematical model for multiple linear regression can be constructed 
in the same manner as simple linear regression. If there are k independ- 
ent variables • • • , and all the x^s have fixed values, so that the 
only random variables are ‘ conditional frequency 

function of corresponding to (23) will become 



I * * * 9 ^ /r — 

'VIttC 

The maximum likelihood estimates of the regression coefficients , 

are the same as those obtained by least squares and are given by solving 
the normal equations (13) of Chapter 7, 


REFERENCES 


Additional discussion and problems on the material of this chapter may be found in 
A. M. Mood, Introduction to the Theory of Statistics, McGraw-Hill Book Co. 



THEORETICAr FREQUENCY DI$Tm^ 209 

.....EXERCISES' 

1. lff{x,y) = I, 0 ^ X ^ 1, 0 ^ 2/ ^ 1, find the probability that («) a; > .5 

and y > J; (b) x > .5; (c) x > y; (d) x = y; (e) x > .5, given th4 y = -5; 

(/) X >y^ given that y < .5; {g) x y <\\ and {h) x^-\-y^<\. 

2. If f(x,y) = X > 0, 2/ ^ 0> find the probability that (4 x <1 ; 

(b)x < 1, given that 2/ — 1; (c)x > y\ {d)x + y <1; and(e)a; > 2/, given that 
2 / < 1 . 

3. If f(x^y') == 2, 0 ^ a? ^ 1, 0 ^ 2/ ^ find (fl) the marginal 4equency 
functions ; (6) the conditional frequency functions ; and (c) the curve of regression 
oiyonx. 

A, Given /(a?, 2/) = c(xy + e%0 ^ 1,0 ^y ^ 1, (a) find the v^ne of c; 

(b) find f(x); and (c) determine whether x and y are independently di^ributed. 

5. Given /(aj, y) = x '^0, y '^0, find (a) the marginal frequency 

functions; (ib) the conditional frequency functions* and (c) the curve pf regres- 
sion of 2/ on a?. 

6. Find the equation of the regression curve of y on a?, given that (a?, y) = 
|(1 + a; 4- y)l(l + x)\l A-y)\x>0,y > 0. 

7. Given/(a;, 2/) « 1,0 ^ a? ^ 1,0 ^y ^ 1, find(fl)iW^/; (b) p; and (c) 

8. Given/(a3,2/) =2/^2^0 ^ a; ^a,0 :<2/ a?,find(a)/^p/; and(c)/^y|a.. 

9. Given f(x, y) =* c in the two triangular regions bounded by the lines a? = 
— 1, y =^0, y = —a; and x = 1, 2/ —0,y=x, find (a) the value of c; (Jb) the 
equation of the regression curve of y on x\ and (c) the value of />. 

10. Given /(a;, y) = c(x^ + 2/^), x^ +2/^ <1, and zero elsewhere, finjd (a) the 

value of c; (b) the equation of the regression curve of 2/ on x\ and (c) the equation 
of the regression curve of aj on 2/. ! 

11. Find a nontrivial joint distribution of two variables x and y sucli that the 
regression curve of 2/ on x is the parabola y —x^, 

12. Given /(ar, y) = 8a?2/, 0 ^ a? ^ 1, 0 ^ y ^ a?, show that x and y are not 
independent random variables. 

13. If the exponent in the normal frequency function of x and 2/ is ; 

-fWa; - 1)2 - 9.6(a; - l)(y + 2) -I- 16(y -h 2)2] 

find (a) fly,, fly, Oy., Oy, p; (b) the marginal frequency function of x; an[d (c) the 
regression line of 2/ on a?. JT 

14. Assume that a bomber is making a bombing run in the direction ilong the 
positive y axis at a square target 200 feet by 200 feet, whose center is at the origin 
and whose sides are parallel to the coordinate axes. Assume further that the x 
and y errors in repeated bombing runs are normally distributed about !0. {u) If 
the X and y errors are also independently distributed with — cr^, = 400 feet, 
find the probability that the target will be hit on the first run. (h) Under the 
conditions of (a), find the probability of getting at least 1 hit in 10 runs, (c) Under 
these same conditions, how many runs would be needed to make the probabiiity 
at least .9 of getting at least 1 hit on the target? {d) Show why it would be 
difficult to work (a) if the x and y errors were correlated with, say, p =4: i. 



210 


INTRODUCTION TO MATHEMATICAL STATISTICS 


15. How must the frequency function of a normal variable x be modified if x 
is restricted to (a) values larger than > and {b) positive values? 

16. If ic and y are independently and normally distributed with = 0 

and^a; = = 1, find the probability that 07^ + 2 /^ < 1 and (6) determine what 

size circle with the center at the origin is such that the probability is .95 that the 
sample point (x, y) will fall inside it. 

17. If and y are independently distributed, show that the curve of regression 
will be a horizontal straight line. 

18. If ic and y are independent variables, find expressions for the mean and 
variance of z ^ xy in terms of the means and variances of x and y. 

19. Let n independent trials be made of an experiment for which p is the 

probability of success in a single trial. Let x equal the number of successes and 
let y equal the sum of the numbers of the trials at which successes occur. Write 
a? = + • • • 4- where x^ = \ or 0, depending on success or failure, and 

write 2/ = 2^1 + * ' ’ + 2/n where yi = i or 0, depending on success or failure. 
Calculate E(y\ E(xy), and 

20. Suppose that a binomial distribution is to be truncated by agreeing to 
discard the value x —0 whenever it occurs. Find the resulting frequency function 
of X, that is, find the conditional frequency function of binomial x when 1 ^x 

21. Suppose it is known that x and y are jointly normally distributed with 

means //a, = 4, = 2, (yx = and p = f . If you wish to estimate the 

value of y for an individual whose x value is equal to 6, how will the size of the 
variance of the error of this estimate differ from that when nothing is known 
about his a; value? 

22. In problem 14 suppose the x and y errors are normally distributed about 

the point of aiming rather than about the center of the target and that the x and y 
coordinates of the aiming point are independently normally distributed about the 
center of the target with <7^ = = 100 feet. Letting x = z + u and y ^ w + v, 

where z and w are the aiming errors and u and v are the bombing errors, solve 
part (a) of problem 14 by using the fact that x and y are independently normally 
distributed because they are the sums of such variables. 

23. Verify by integration that definition (16) is consistent with the general 
moment properties given in (14) and (13). 

24. Construct or describe a joint non-normal distribution of 2 variables whose 
marginal distributions are both normal. 

25. Find a nontrivial joint distribution of 2 variables such that both regression 
curves are straight lines. 

26. Prove that all vertical plane sections of a normal correlation surface are 
normal curves. 

27. Prove that x and y are the maximum likelihood estimators of and fZy 
for the bivariate normal frequency function. 

28. Assume that == = 0 and — Cy = 1 for x and y jointly normally 

distributed. Find an equation whose solution gives the maximum likelihood 
estimate of p for a sample of size «. How does this result compare with that when 
the meJiTis and variances are unknown ? 



THEORETICAL FREQUENCY DlS«lRtjlTOK^ f 211 

29. Show that the maximum likelihood estimates of the /5’s in (2^) are the 
same as the least-squares estimates given by (13), Chapter 7. 

30. Given the conditional distribution /(^z | x) = x'^e~^ly\, where y i$ discrete 

and can assume the values y = 0, 1, 2, • • • , and given /(a?) = e'’®, x > 0, show 
that the marginal distribution of is given by ^( 2 /) = Use the! factorial 

property of the gamma function integral. 

31. The length of life a? of a physical particle is a random variable whose 

distribution depends on a parameter a. This parameter characterizes the type of 
particle. A population of particles is made up of various types of particles with 
the proportion having parameter value a given by ^(a) = a > 0. If the 
distribution of length of life for fixed a is given by | a) = > 0, find 

the unconditional frequency function f(x). 





CHAPTER 9 


General Principles For Testing 
Hypotheses and For Estimation 


9.1 Testing Hypotheses 

A large part of the material presented in the preceding chapters has 
been concerned with testing various statistical hypotheses. These hypoth- 
eses were tested by means of random variables such as x, or p', which 
seemed appropriate for the particular problem being considered. Thus 
X was introduced because it appeared to be a satisfactory variable to use 
for testing a theoretical mean. A random variable such as this, which is a 
function of sample values, is often called a statistic. Now, not only was 
the statistic for a given type of problem selected on intuitive grounds, but 
the critical region for the statistic was also selected on an intuitive basis 
rather than on any logical principle. Although such intuitive arguments 
often yield highly efficient tests for testing the hypothesis in question, 
some logical principle for selecting the proper test is necessary if one is to 
be certain of always designing a good test. Such a principle was intro- 
duced in Chapter 3 for testing a hypothesis Hq against an alternative 
hypothesis In this chapter the ideas introduced in Chapter 3 will be 
studied more thoroughly and extended somewhat to include more general 
problems. 


9.1.1 Test of a Hypothesis 

From (2) and (4), Chapter 3, it will be recalled that a statistical hy- 
pothesis is defined as an assumption about the frequency function of a 
random variable and that a test of a hypothesis is a procedure for deciding 
whether to accept or reject the hypothesis. 

In all the problems of testing hypotheses that have been considered 

? 19 . 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AND FOR ^STIMATIC^ 213 

thus far the procedure for deciding whether to accept or reject the liypoth- 
esis has consisted in selecting a statistic based on a sample of fixed size 
w, calculating the value of the statistic for the sample, and then r|ejecting 
the hypothesis if and only if the value of the statistic corresponded to a 
point in the chosen critical region. 

A more general procedure that possesses striking advantages ip many 
situations is one in which the random sample is obtained by selecting one 
individual at a time until a sufficiently large sample has been accupiulated 
to arrive at a reliable decision. This method of sampling, called sequen- 
tial sampling, often arrives at a decision some time before the fixed-size 
sample, with the same size type I and type 11 errors, is exhausted, and 
thus it often decreases the cost of sampling. In the sequential procedure 
one must decide at every stage of the sampling whether to accept the hy- 
pothesis, to reject the hypothesis, or to continue sampling. The fiiied-size 
sample procedure does not permit any conclusions to be drawn until the 
entire sample has been taken and does not permit additional salpipling. 
A sequential method for testing hypotheses is discussed in Chapter 14; 
hence only fixed-size sample procedures are considered in this chapter. 


9.1.2 Kinds of Tests 


Most of the statistical hypotheses that one encounters are assump- 
tions about the parameters of a frequency function. The hyjliotheses 
tested in the preceding chapters have been of this kind. For the jpurpose 
of describing them, let ‘ denote a known frequency 

function that depends on k parameters. A statistical hypothesis then 
becomes an assumption about the k parameters. In studying hypotheses 
of this kind it is convenient to classify them into one of two t;^es by 
means of the following definition. ; 

(1) Definition: If a hypothesis specifies the values of all the parameters 
of a frequency function, it is called a simple hypothesis; otherwise, it is 
called a composite hypothesis. 

As an illustration, suppose the frequency function is 


f(x; ^1, ^ 2 ) — 


2 V 02 / 


If the hypothesis is 2, then JTo is a simple hypothesis. 

If, however, the hypothesis is 10, < 2, then f/o is corhposite. 

The theory of how to design good tests for simple hypotheses Is much 
simpler than that for composite hypotheses. In the next two sections two 



214 INTRODUCTION TO MATHEMATICAL STATISTICS 

methods for constructing good tests are discussed. The first method is 
directly applicable to simple hypotheses only, although it sometimes solves 
composite problems also, whereas the second method is applicable to 
both simple and composite hypotheses. 


9.1.3 Best Tests for Simple Hypotheses 

In this section a method is given for constructing best tests, in the sense 
of principle (7), Chapter 3, for simple hypotheses. In discussing the rela- 
tive merits of different tests, this principle requires that only tests with an 
agreed upon type I error size, denoted by a, be considered. Then a best 
test is defined as a test in this set that minimizes the size of the type II 
error, denoted by /?. The method of constructing a best test depends on 
the use of a theorem that was first proved and used by the two statisticians 
after whom it is named. The theorem, called the Neyman-Pearson 
lemma, will be proved for a frequency function, /(a;; 6), of a single con- 
tinuous variable and a single parameter; however, by merely thinking of 
X and 6 as vectors, the proof will be seen to hold for any number of ran- 
dom variables and parameters. The variables ^ 2 , • • ‘ occurring in 
the theorem are understood to represent a random sample of size n from 
the population whose frequency function is /(a;; 6). The theorem is con- 
cerned with a simple hypothesis 11^:6 = and a simple alternative 
= 6^, This is the type of problem discussed and illustrated in Chap- 
ter 3 beginning with the illustration following (4), Chapter 3. One should 
review that material before studying the following. In particular one 
should recall that the phrase “critical region of size a” means that the 
critical region is one for which the size of the type I error is a. In terms 
of this language, the theorem may be expressed as follows. 

(2) Neyman-Pearson Lemma: If there exists a critical region A of size 
a and a constant k such that 

TT/(*o Oi) 

— ^ k inside A 

®o) 

«•=! 

and 

n/(*o ®i) 

— < k outside A 

rT/(^t> ®o) 

i = l 

then A is a best critical region of size a. 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AND FOR ESTIMAT^N 215 

To prove this lemma, let A* be any other critical region at size a. 
The regions A and A* may be represented geometrically as the regions 
interior to the indicated closed surfaces in Fig. 1. For simplicity of 
notation, let ^ 


denote the frequency function for the variables X 2 , * • • >^hen Hq 
is true, and let denote this function when i/j is true. Further, write 

f • • • f n/K; ®o) dXj,-’-dx„ = ( La dx 
J A J i~l j 

with a similar expression for 
Since 4 and >4* are both critical regions of size a. 


(3) 


j Lq dx =s j Lq 


dx 


But from Fig. 1 it is clear that the integral over fe, which is the cpmmon 
part of A and A^, will cancel from both sides of (3) and reduce it to the 
form 

(4) \ L^dx = \ Lq dx 


c 


Now, calculate the size of the type II error for both A md A* i Since 
the size of the type II error is the probability that the sample point will 
fall outside the critical region when Hi is true, which in turn is egual to 
1 minus the probability that it will fall inside the critical region wfien Hi 
is true, these errors may be written in the form 



Li dx 




216 

and 


INTRODUCTION TO MATHEMATICAL STATISTICS 


jS = 1 — Lidx 


Consequently 




— I I '^1 * 


If the integral over the common part b is canceled, this difference will 
reduce to 


— = L^dx --x L^dx 

*'c 


Since region a lies in A, it follows from the definition of A given in (2) 
that every point of a satisfies the inequality 

^^0 ^ -^1 


Lidx'^ k\ L^dx 


Similarly, since c lies outside A, every point of c satisfies the second 
inequality in (2), namely, 

kL^ ^ L-i 

hence 


J L'^ dx k I Lq dx 

G VG 


When these two results are used in (5), it follows that 




> fe Lq do; — fe LQdx 

•'a 


But from (4) the right side must be equal to zero ; hence 

^ |8 


Since P* is the size of the type II error for any critical region of size a, 
other than A, the preceding analysis proves that ^4 is a best critical region 
of size a, where best is understood to mean a critical region with a mini- 
mum size type II error. 

The constant k of this lemma is chosen to make ^ a critical region of 
size a. In most problems, as k goes from 0 to infinity, the size of A 
decreases from 1 to 0, thus making it possible to determine the proper 
value of k. 

The usefulness and meaning of this lemma is best explained by means 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AND FOR ESTIMAtldN 217 

of illustrations : consider first the problem that was discussed in bhapter 
3, beginning with (3). For that problem 

f(x;d)^de-^^x^0 I 

In order to discuss a somewhat more general problem let the hypothesis 
be Hqi O = 00 the alternative be ^ 0i < 0p and assume that a 
sample of size n is to be taken. The corresponding likelihood functions 
are ' 

n 

n 

i=l 

and 

n 

n -01 S Xi 

-t-i = n/(*i; ®i) = 

*•=1 

According to (2), the region A is the region in which 

> k 

Q^ne-eo^xt- 

This inequality may be written in the form 


“ k\eoJ 

Taking logarithms, the inequality becomes 


Since Hi specifies that 0i < dividing both sides by 0^ — 0o will reverse 
the inequality and yield 



, ...... . u ■■■ 

Now the problem of Chapter 3 has « = 1, Sq = 2, and = 1 ; hence 
for that problem the best critical region, as given by (6), would be that 
part of the x axis to the right of the point 




where k is chosen to make the probability .135 that x will exceed Xq 
T hus the right tail, which was shown in Chapter 3 to be better than the 



218 


INTRODUCTION TO MATHEMATICAL STATISTICS 


left tail for that problem, is now shown to be the best possible critical 
region for that problem. 

The derivation that led to (6) does not depend on the particular value 
of 0^, provided that 6^ < 6^, Thus the same critical region is used what- 
ever the value of di, as long as 6i < 6q, The value of k necessary to pro- 
duce the same for (6) will, of course, depend on the value of This 
discussion shows that (6) gives the best critical region for testing the 
hypothesis against the composite alternative 

Thus the Ne)mian-Pearson lemma, although designed to test a simple 
hypothesis against a simple alternative, can sometimes be used to solve a 
problem in which the alternative hypothesis is composite. From this 
result it follows that the critical region selected for the problem discussed 
in Chapter 3 is the best critical region for testing //qiA = 2 against 
< 2. This form of the alternative hypothesis would undoubtedly 
be much more realistic and satisfying to the experimenter than the original 
alternative 

As a second illustration, consider the problem of testing whether a 
normal population with unit variance has a mean 0 = 0^ or a mean 
d = d^<d^. Here 


/(a:;0) = 


-hx-d)^ 


VItt 


Then 


^0 = IT ®o) = (277) 

n=l 


-5 — .S 


and 


n 1 ^ 




i = l 


The region A in (2) is therefore the region in which 








If logarithms are taken, this inequality will reduce to 
Simplification of the left side will produce the form 

2(01 - eo)yx, ^ 2 log * + 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AND FOR ESTIMATION 219 


If both sides are divided by 2n{dy^ — 6 q), which is a negative number 
because it was assumed that < 0o> this inequality will finally 
reduce to ir 


(7) 


_ 21ogfc + (gi^-0,> 

2n(e^-e,) 


By choosing k properly, the quantity on the right can be made tp have a 
value Xq such that the probability that x will he less than x^ when is 
true will be equal to, say, a = .05. Thus the best critical region here is 
the left tail of the x distribution. This is the region that was chosen on an 
intuitive basis for the problem of this type discussed in 6.5.1. 

As in the first illustration of this section, it will be observed that the 
critical region obtained by applying (2) is the same for all alternative 
values ©i, provided that 6-^ < Qq? and is the best critical region -for the 
more general composite alternative : 6 < 

If 01 > 0Q, inequality (7) will be reversed; consequently the best critical 
region will consist of the right tail of the ^ distribution. This! critical 
region will also be best for the composite alternative : 0 > 0o- : 

If one wished to test H^.Q = Qq against 0o, there woulcl be no 

best critical region for all possible alternative values 0^ because when 
01 < 00 the left tail will be best, whereas when 0i > 0o the right iail will 
be best. The preceding result is typical; best critical regions usually 
exist only if the alternative values of the parameter are suitably reltricted. 

As a final illustration, consider a discrete variable problem. Although 
lemma (2) was proved for continuous variables, the same prpof will 
apply to discrete variables if one replaces integrals by sums. A certain 
difficulty arises with discrete variable problems in that there may be very 
few, or no other, critical regions having the same value of a as th!at for a 
selected critical region. If this were true, it would be academic to fay that 
a certain critical region is a best critical region of size a. These possibilities 
are considered in the following illustration. 

Let a? possess a Poisson distribution with mean // and let the hypothesis 
//o-jW = be tested against the alternative hypothesis = 

By proceeding as for continuous variables, 


h. 

U 


i=i xA 


n Xi 

i=l 



-il-; - ■ 




(a)-. 




The inequality 



220 INTRODUCTION TO MATHEMATICAL STATISTICS 

is equivalent to the inequality 

2 log ^ log A; + - (i^) 

/<o 

Since log [XxIH'q < ^ because it was assumed that the preceding 

inequality can be written 

2 ^ < log k + - ^o) 

log/^i - lOgjMo 

It was shown in Chapter 6 that the sum of independent Poisson vari- 
ables is a Poisson variable with its mean equal to the sum of the means; 
it therefore follows that the variable a; = is a Poisson variable with 
mean n/i. The critical region determined by the preceding inequality is 
therefore equivalent to a critical region of the type z for the Poisson 
variable z where Zq is chosen to make the region one of size a. 

This is where the difficulty with discrete variable problems arises. 
Since the sample space for a Poisson variable consists of the points 
2 ! = 0, 1, 2, • * * , the critical region z ^ Zq is constructed by starting with 
the point 2=0 and adding successive points z = 1, 2 = 2, etc., until the 
sum of the probabilities for those points under Hq is equal to a. But it is 
unlikely that this sum will exactly equal a previously specified a value. 
This unsatisfactory state of affairs can be overcome by employing what 
is known as a randomization device. Suppose, for example, that a = .05 
and that the Poisson probabilities under corresponding to 2 = 0, 1, 
2, • • • are .018, .072, .144, • • • . Choosing 2 = 0 as the critical region 
makes a = .018, whereas choosing it to consist of the two points 2 = 0 
afid 2=1 makes a = .018 + .072 = .091. The randomization device 
that will yield a value of a = .05 consists in agreeing to reject when 
2 ; = 0 but to reject Hq only a certain proportion of the time when 2 = 1 . 
The proper proportion here is p, where p satisfies the equation .018 + SYllp 
= .05. The solution of this equation is/? = |; consequently, in carrying 
out the test, one would consult a table of random numbers, or use some 
game of chance that would yield successes f of the time, to determine 
whether to place 2 = 1 in the critical region when the value 2 = 1 is 
obtained. By using such randomization devices, it is possible to discuss 
best tests and to apply lemma (2) to discrete variable problems in much the 
same manner as for continuous variables. In practical applications with 
discrete variables one usually dispenses with these devices and chooses a 
critical region whose size is possible and close to the desired a value. 


9.1.4 Likelihood Ratio Tests 

When the Neyman-Pearson lemma fails to yield a best test, or when 
the hypothesis is composite rather than simple, it is necessary to place 



GENERAL PRINCIPLES EO^ T^SHriNG M^ FOR EStmATldN 221 

further restrictions on the class of tests and then attempt to fini! a best 
test in this restricted class or to introduce some other principle for ob- 
taining good tests. In this section a second principle for coii$tructing 
good tests is introduced and discussed. Since any method for; testing 
composite hypotheses will include the testing of simple hypotheses as a 
special case, this principle is introduced from the point of view of com- 
posite hypotheses. 

Suppose that the variable has a frequency function /(x; Si, * * ' , 6 ^) 
that depends on k parameters. Let the composite hypothesis to be tested 
be denoted by Mq I 6 ,- = d/(i = 1, 2 , • * • , k), where 6 / may or may not 
denote a numerical value. Thus, if there are two parameters, might 
be the hypothesis that 6^ = 10 with ©2 unspecified; then ==*: 10 and 
62' = 02* As a second illustration, Hq might be the hypothesis that 
0 i = 62; then 61' = 01 and 63' == 0 i. With the aid of this notation, 
f(x; 6i\ • • • , 0fc') will denote the frequency function of x when /fo is 
true. 

Let denote the maximum likelihood estimator of 0 ^ for the lil^elihood 

n 

function L( 0 ) = XT/(i»i; 0 i, * * * , 0 *)? where the likelihood function is 

treated as a function of the parameters and the x^ are fixed. Similarly, 
let 6/ denote the maximum likelihood estimator of 0^ when is true; 

n . _ _ ....... 

that is, for the likelihood function L(d') = ) Now, 

form the ratio 

( 8 ) ■ 

US) 

This is the ratio of the two likelihood functions L( 0 ') and L(&) when 
their parameters have been replaced by their maximum likelihood esti- 
mators. Since the maximum likelihood estimators^^ a^^^ functions of the 
random variables X2, • • * , the ratio A is a function of a?i, 
only and is therefore an observable random variable. 

The denominator of A is the maximum of the likelihood functipn with 
respect to all the parameters, whereas the numerator is the m|ximum 
only after some or all of the parameters have been restricted by 
consequently it is clear that the numerator cannot exceed the denominator 
in value and therefore that A can assume values between 0 and 1 only. 
Now the likelihood function gives the probability density (or probability 
in case x iB a discrete variable) at the sample point x^, ojg, • • • ,x^^ There- 
fore, if A is close to 1, it follows that the probability density (or probability) 
of the sample point could not be increased much by allowing the param- 
eters to assume values other than those possible under HqI consequently, 
a value of A near 1 corresponds intuitively to considerable belief in the 
reasonableness of the hypothesis Hq. If, however, the value of A is close 



222 


INTRODUCTION TO MATHEMATICAL STATISTICS 


to 0, it implies that the probability density (or probability) of the sample 
point is very low under Hq as contrasted to its value under certain other 
possible values of the parameters not permitted under and therefore 
a value of 2 near 0 corresponds to considerable belief in the unreason- 
ableness of the hypothesis. If increasing values of 2 are treated as corre- 
sponding to increasing degrees of belief in the truth of the hypothesis, 
then 2 may serve as a statistic for testing with small values of 2 leading 
to the rejection of 

Now suppose that is true and the frequency function of the random 
variable 2, say g{X), has been found. This is theoretically possible if the 
explicit form of f(x; • • • , 6^') is known. Suppose, further, that g(2) 

does not depend on any unknown parameters. Then one can find a value 
of 2, call it 2o, such that 



The critical region of size a for testing Hq by means of the statistic 2 then 
is chosen to be the interval 0 ^ 2 ^ 2o. 

The preceding explanation of how likelihood ratio tests are constructed 
may be summarized in the following form. 


(10) Likelihood Ratio Tests: To test a hypothesis Hq, simple or com- 
posite, use the statistic 2 given by (8) and reject if, and only if, the sample 
value of 2 satisfies the inequality 2 < 2o, where 2o is given by (9). 


There is a great deal of similarity between the techniques used to obtain 
a best test and a likelihood ratio test. They both use the ratio of the two 
likelihood functions as a basis for making decisions. This similarity may 
be observed by comparing (2) and (8). 

Although the use of 2 as a statistic for testing hypotheses has been 
justified largely on intuitive grounds, it can be shown that such tests 
possess several very desirable properties. These properties will be dis- 
cussed briefly after a few illustrations have been given on how to construct 
likelihood ratio tests. 

Consider the second illustration of the preceding section, namely, 
the problem of testing the hypothesis ^ 0o, where 


/(^; 0 ) = 




a/Iti 


1 ^ 

1 ,( 6 ) = ( 277 ) 


Here 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AND FOR ESTIMATlIblSr 223 

Since L{d) will be maximized if log L{B) is maximized, it will suffice to 
maximize log L{S). But 


d log L{B) 
dd 

hence ^ = x, and therefore 


n 


I 


(%-e) 


__n 

L0) = (27r) 


1 ^ 
-I S 


Since there are no parameters to be estimated under I{q, 


_n 

L0') = L(e') = (Itt) 




Then A, as given by (8), becomes 



1 r ^ « “I 

i\ .S 

^L»=i *=i J 


Upon simplifying the exponent, A reduces to 


( 11 ) 


X 






Now n and 0© known constants; hence (11) expresses a relation- 
ship between X and By means of this relationship the critical value 
Xq CSXl be determined without finding g(X). The nature of the relationship 
expressed by (1 1) is most easily seen graphically, as in Fig. 2. To each 
value of X correspond two values of x, which are symmetrical with respect 
to X = 6q. There are therefore two critical values of » correspohding to 
the critical value of A = Xq. Figure 2 also shows that increasingly small 
values of X correspond to increasingly large values of 1« — 0ol- Therefore 
the 5 per cent critical region for X, consisting of the interval 0 ^ A ^ Aq, 
will correspond to the two 2| per cent tails of the normal f distribution. 
Thus the 5 per cent critical region for the likelihood ratio test is equivalent 
to the two equal tails of the x distributibh given by the familiar inpquality 




224 


INTRODUCTION TO MATHEMATICAL STATISTICS 


1^ — 6ol V /2 > 1.96. For this problem the likelihood ratio test is precisely 
the same test as the two-tailed test selected on intuitive grounds in preceding 
chapters. 

The preceding illustration was concerned with testing a simple hypoth- 
esis and was selected for the purpose of comparing the result of applying 
the Neyman-Pearson lemma for best tests with that obtained by the 
likelihood ratio approach. It will be recalled that a best test exists for this 
problem only for one-sided alternatives, 6^ > 6q or < 0o; hence the 
likelihood ratio test cannot be a best test. It serves as a compromise 
test when there is no restriction placed on the alternative values of 6. 

In the preceding illustration it was not necessary to find the distribu- 
tion of A because it turned out that A was a simple function of x whose 
distribution is known. In general, however, there is no assurance that 
some such nice relationship to a familiar variable will exist. Then one 
must use whatever tools he has available in an effort to find the distribu- 
tion of A. Fortunately, for large samples there is a good approximation 
to the distribution of A which eliminates the necessity for finding the exact 
distribution. This result from the advanced theory of statistics may be 
expressed in the form of a theorem. 

(12) Theorem: Under certain regularity conditions y the random variable 
~2 logg A, where A is given by (8), has a distribution that approaches that 
of a variable as n becomes infinitey with its degrees of freedom equal to 
the number of parameters that are determined by the hypothesis Hq, 

Since small values of A correspond to large values of —2 logg A, it 
follows that the critical region for a test based on —2 logg A will consist 
of large values of this variable. If the borderline of a critical region for 
the x^ variable —2 logg A is denoted by x^^ then x^ must be a number 
such that P{x^ > Xq] = oc. Thus, in order to determine the critical 
region for this approximate likelihood ratio test, it is necessary to have a 
table of critical values, x^- Table III in Appendix 2 enables one to find 
such critical values. Since the x^ distribution as given in (20), Chapter 6, 
depends on the parameter Vy called the number of degrees of freedom, 
any critical value will depend on v. Graphs of the x^ frequency function 
corresponding to several values of v are given in Fig. 7, Chapter 6. 

According to theorem (12), the number of degrees of freedom in the 
approximating x^ distribution for the illustration considered earlier is 
V ^ I because only a single parameter was determined by //q- Further- 
more, from (11) the value of —2 logg A is n{x — From (12) it there- 
fore follows that the critical region for this approximate test is the region 
in which n{x — Oo)^ > x^^ This is the same critical region as that ob- 
tained from the exact likelihood ratio test. It should be noted that 



GENERAL PRINCIPLES fOft TESTING HYPOTHESES AND EOR ESTIMATION 225 


Vn(x — 0o) is a standard normal variable because the standard deviation 
of ^ is 1/V n here; therefore from 5.4.3. 1 its square is a variable with 
1 degree of freedom. Thus this theorem is seen to check with the known 
exact distribution here. li 

9.1.4,1 Testing the Equality of Variances. Consider an illnstration 
involving the testing of a composite hypothesis. Let Xj^ be k 

independent normally distributed variables with means , /i* and 

variances • * * , Let random samples of sizes Hi, rip • • • , 

be drawn from these populations and let the hypothesis to be tested be 
jF/q: ==••*== 

The random variable corresponding to the yth observation for the variable 

x^ is represented by x^^. Thus there are altogether = n random 
variables. Since ^ 


J \^i3^ M'i^ 


'S/llTGi 


the likelihood function may be written as 

^ a, ) 

(13) L{x; 

(277) • • • cr/^ 

When is true, (13) reduces to 

( 14 ) «»;/<', o') — 

where a on the right side of (14) represents the common value of the cr^. 
In order to calculate A, it is necessary to maximize (13) and (14) with 
respect to their parameters. This is accomplished by first taking logarithms 
of both sides and then maximizing the logarithms. If (13) and (14) are 
denoted by L and L^, respectively, then it will be found that 

9logL_ 1 ^ ..X 

dfJli or/ 3 = 1 


9logL n,. 1 


d log Lq _ J_ y 

djjii 3=1 


~ — •* + ”1 2 “ ft)* 


ft ft ^=1 


= “i 2 - ft) 


aa a a i=i i=i 



226 


INTRODUCTION TO MATHEMATICAL STATISTICS 


From the first and third of these derivatives it follows that the maxi- 
mum likelihood estimators for [jl^ are in each case given by jli = Xi* From 
the second and fourth of these derivatives, together with the results just 
obtained, it follows that the respective maximum likelihood estimators 
for the standard deviations are given by 


and 


ff/ = f ^ = s/ 

j = l Hi 


^*=22 ~ == 2 Ml 

f=i i=i n t=i n 


If these estimators are substituted in (13) and (14), respectively, L and 
Lq will become 


L = 


n 


e 


2 


(277)V---J/^ 

and 


A 


(2^)2^_ii iiLj 


The likelihood ratio given by (8) will then reduce to 


(15) 


A = 


i'l”! • • • 


'riys^ + • • • + Mjs/J 


If now the frequency function ^(2) were available under and g(X) 
did not depend on any unknown parameters, it would be possible to find a 
critical value Aq for deciding whether to accept or reject the hypothesis 
that the k populations possess equal variances. However, because of 
the complexity of the problem, it is necessary to resort to the approxima- 
tion for the distribution of X given in (12). 

If Hq had specified that the variances all had a certain known value, 
then the degrees of freedom here would have been k; however, the 
variances are only assumed to be equal in value; therefore the number 
of degrees of freedom is A: — 1. 

Studies made on the accuracy of the approximation when applied 
to (15) have shown that a more accurate test, particularly for small values 
of the can be constructed by altering (15) somewhat. Although this 



GENERAL mNClPLES FOR AND FOR EStlMATiittN 22f 


chapter is concerned with the general theory of testing hypotheses and 
estimation, the problem of testing the homogeneity of variances ^irises so 
frequently and is so important that it may be worthwhile to display the 
more refined test here. The altered form of (15) consists in treating 

■ ■ - i! ■ 


(16) 


1 + — ' — (i — ^ 

3(fc - 1) n, - 1 



as a variable having a distribution with ifc — 1 degrees of freedom, where 
[i is given by 

rij—l 

r 

1 

l f ‘ 

\2(», - ly 

As a numerical illustration of this test, consider the problem of testing 
whether the variability of a manufactured product which is assumed to 
be normally distributed has remained constant over a period of five 
weeks as judged by the following five weekly sample variances based on 
samples of five each: = 237 , 320,^32 =:: 853,s^^ = 296 , ^ 5 ^=, 141 ^ 

Here «^ == 5 (f = 1, • • • , 5); hence 

“ x-* 2\10 



Then 

log, fi = 2 '2 log, Si^ - 10 log, 

= -1.844 

Further computations yield the value of 3.35 for (16). iSince the 5 per cent 
critical value of x^ for — 1 == 4 degrees of freedom is Xo^ 
result shows that the hypothesis of homogeneity is a reasonable; one as 
far as these data are concerned. The unimproved likelihood ratio test 
given by (15) would have yielded a value of —2 log^ 2 = 4.6. The fairly 
large difference in the numerical values of these two variables, which are 
assumed to possess the same approximate x^ distribution, is due to the 
small values of the 

Although the problem of testing hypotheses, both simple and corfiposite, 



228 


INTRODUCTION TO MATHEMATICAL STATISTICS 


would appear to be completely solved for large samples, the question 
whether likelihood ratio tests are good tests from the point of view of 
type II errors still remains. Studies show that when best tests as given 
by the Neyman-Pearson lemma do not exist, likelihood ratio tests are 
often equivalent to tests that are known to be very desirable from the 
type II error point of view, particularly for large samples. Thus, when 
best tests do not exist, it is usually safe to employ a likelihood ratio test, 
provided that the samples are fairly large. 


9.2 Estimation 

An introduction to the problem of estimating parameters of frequency 
functions was given in Chapter 3. In that chapter maximum likelihood 
estimation was introduced as a favorite method of many statisticians for 
obtaining point estimates of parameters. In this chapter properties to be 
desired in point estimates are considered, and estimation by means of 
intervals is introduced. 


9.2.1 Unbiased Estimates 

Perhaps the first property of an estimate that one would think of as 
being desirable is the property of the estimate converging, in some sense, 
to the value of the parameter as the sample size becomes increasingly large. 
Since almost any reasonable estimate will possess such a property, a closely 
related property that is somewhat more restrictive is often considered 
instead. This is the property of being unbiased. For the purpose of 
defining this term, consider a random variable x whose frequency function 
depends on a parameter 6. Let x^, represent a sample of size n 

from the corresponding population and let t{x-^, * * * , be any statistic 

being contemplated as an estimator of 6. Then the property of being 
unbiased may be defined as follows. 

(17) Definition: The statistic t = t(x^, xj is called an un- 

biased estimate (or estimator) of the parameter d if E[t\ = 6. 

This property merely states that the random variable t possesses a 
distribution whose mean is the parameter 6 being estimated. This prop- 
erty was shown in section 6.6 to hold, for example, for t = x when 
estimating the mean jU of a distribution. 

As an illustration of how the bias in a statistic may sometimes be 
determined by means of expected value formulas, consider the expected 



GENERAL PRINCIPLES FOR TESTING HWOTHESES AND FOR ESTIMATICIN 229 

value of a sample variance based on a rahdbiii saiiiple of size From 
properties of E, and the definition of it follows that i 


(18) 


£[s^] = E 


Lni = i 
1 ^ 


= - 2 [(^i - i«) - (* - 

U «=i 


= E 


- 2 (*< - -{x- nf 

Lni=i 


= - 2 /«)*-£(*-/«) 
ni=i 

1 ^ 

= -i; 



n 



This shows that is not an unbiased estimate of cr^, which means that 
if repeated samples of size n are taken and the resulting sample variances 
are averaged the . average will not approach the true variance ih value 
but will be consistently too small by tHa factor of (n — l)/«. For small 
samples this factor becomes important; consequently, one must be 
careful how he combines samples in making an estimate of tHe true 
variance when an unbiased estimate is desired. In order to overcome the 
bias in it is merely necessary to multiply hy njin — 1) and use the 
resulting quantity as the estimate of o^. Then, because of (18), 

s'*) = -^E[f] = ff® 

\n - 1 / n - 1 

Since 



2 

i = 1 


n — 1 


it is clear that one can avoid the bias in estimating variances by dividing 
the sum of squares of deviations by n — 1 rather than by n, as Was the 
practice in the preceding chapters. It is because of this property that 
some authors define the sample variance as — I 

As a second illustration, consider the problem of how to combine 
several sample variances to obtain a single unbiased estimate of the 



230 


INTRODUCTION TO MATHEMATICAL STATISTICS 


population variance. Such a problem would arise, for example, in 
quality-control work if one wished to obtain an unbiased estimate of the 
variability of a manufacturing process as measured by and had avail- 
able a number of daily estimates of the variability. Let 
denote k sample variances based on samples of sizes «!,•••, respec- 
tively. Then, if each sample variance is weighted with the size of the 
sample on which it is based, the proper weighted average to use for esti- 
mating is given by 

J + • • • + 

a 

where a is chosen to make this estimate unbiased. From properties of E 
and the result in (18), it follows that 

£[<] = - [(«1 - +••• + («*:- 
a 

2 

= — (rti + * * * + — k) 

a 

In order that t be unbiased, it is therefore necessary to choose a = «i-h 
• • • + — k. Thus the desired estimate of is given by 

(19) ^iSi^ 

-h • • * + Wfc “ k 

As an exercise to illustrate the convenience of using the operator E 
for calculating mean values and at the same time to derive a useful for- 
inula, consider the problem of expressing the variance of a linear combina- 
tion of a set of variables in terms of the variances and correlations of the 
variables. Let 

2 = H h 

be the function whose variance is desired. Then 
E[z] = + • • • + 

and 

s - E[z] = a:^(x^ " iWi) + * • * + - fij,) 

Then, from the definition of the variance of a variable and this result, 
cr/ = E[z - E(z)f 

= + 1- 

= + 22 - /^i)] 

1=1 i<i 

h 

= 2 + 22 aiOjEiXi - - IX,) 

i=1 i<^ 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AND FOR ESTIMATION 231 

Denoting the variance of the variable by n/ and the correlation co- 
efficient between and Xj by it follows from (12) and (13), Chapter 18, 
that ; 

hence that 

(20) <t/ = X «<V + 22 

i<j ■ 

ii ■ 

When the variables x^, • • • , Xj^ are uncorrelated, this formula reduces to 
the well-known formula 

(21) 0-/ = 2 

which is essentially equivalent to formula (14), Chapter 6. Formulas 
(20) and (21) are very useful for determining the accuracy of esriihates of 
means of populations when these estimates are constructed as linear com- 
binations of other estimates. 


9.2.2 Best Unbiased Estimates 

Although the property of being unbiased is a desirable one jto seek 
in an estimate, it is not nearly so important as the property of an Estimate 
being close in sohie sense to the parameter being estimated. Thus, if an 
estimate f is consisteiitly closer to 0 than another estimate f in i^peated 
samples of the same size, then t would certainly be preferred to t', even if 
t were biased and f were unbiased. Because of the difficulty or: impo 
sibility of determining whether one of two estimates is clos^^ 
other to 0 for any reasonable definition of closeness, it is customary to 
substitute a measure of the variability of ^ about 0 in place of closeness. 
Since the variance, or the standard deviation, has been used to measure 
variability throughout the preceding cEa^^^ one would naturally think 
of selecting one or the other of these measures ; however, unless 0 Happens 
to be the mean of the distribution of i, the variance will not measure the 
variability about 0. This difficulty can be overcome by using the’l second 
moment about 0 as the desired measure. When 0 is the mean of /, that is, 
when / is an unbiased estimate of 0, this measure reduces to the variance 
of L. . ... ..y. 

If now and are two estimates of 0 that are to be comparH, this 
can be done by comparing their second moments about 0. In tHis con- 
nection, a statistic will be said to be better than the statistic for esti- 
mating 0, provided that E{ti — 0)^ ^ Eit^ — 0)^ for all possible values of 
0 and provided that the strict inequality holds for at least one value of 



232 


INTRODUCTION TO MATHEMATICAL STATISTICS 


The problem of deciding whether an estimate is a good one in compari- 
son with all other possible estimates is not quite so simple. The difficulty 
is that a trivial estimate such as ^ = c, where c is some constant, will be 
better as an estimate of the mean 0 of a normal population than x when 
6 happens to be equal to c. Thus one cannot expect to find a reasonable 
estimate such as x to possess a second moment about 6 that is a minimum 
for all possible values of 6. In order to avoid such paradoxical results, 
it is customary to limit the discussion of the goodness of an estimate to 
unbiased estimates. Since the property of being unbiased is required to 
hold for all possible values of 0, trivial estimates such as ^ = c are auto- 
matically eliminated from consideration. In view of the preceding dis- 
cussion, the following definition is introduced as a basis for choosing a 
good estimate. 

(22) Definition: A statistic t = t(x^, ^*^ 2 ? ’ ’ ' j ^n) be called a best 
unbiased estimate {estimator) of the parameter 0 if it is unbiased and if it 
possesses minimum variance among all unbiased estimates {estimators). 

This property must hold for all possible values of 0, that is, regardless 
of what the true value of the parameter may be. Although there are other 
definitions of a best estimate in use, the preceding definition is one that is 
frequently used. It should be realized that the variance was selected in 
(22) because it was considered to measure the concentration of the distri- 
bution of t about 0. Since it is easy to construct an example of a distri- 
bution in which most of the distribution is heavily concentrated about 0, 
yet for which the second moment is extremely large, one must appreciate 
that the second moment is not foolproof for giving the comparison of 
estimates that one originally had in mind. Nevertheless, the same type 
of criticism can be leveled at any other substitute; furthermore, ex- 
perience and theory have shown that (22) is a very useful definition. 

As an application of the preceding ideas, consider the problem of 
determining whether some weighted average of a random sample from 
a population can yield a better unbiased estimate of the population 
mean than the sample mean. Let the two competing estimates be written 

^1 = aia^i + • • • + 

and 

The unknown a’s in are selected to make t^ unbiased and to minimize 
E{t^ — 0)^. In order to determine the bias in t^, calculate 

H + 

= aifj, + • • • + 

= («i + • • ■ + 



GENERAL PRINCIPLES FOR TESTING iryPOTHESfcS AND FOR ESTIMATidN 233 
The statistic will be unbiased if the a’s are restricteci to satisfy ’ 

+ * * * + ^ 

This merely states that the sum of the coefficients in ti must be 1 * hence 
the restriction can be ignored if is written in the form 

. _ Ci^i + ■ - + 

q + • • • + 

Since is now unbiased, its second moment about is merely its variance. 
Because the variables ^ , x^ are independent and have the same 

variance, it follows from formula (21) that the variance of is given by 


n 



Now choose the c’s to minimize this expression. Using calculus methods, 
the c’s must satisfy the equations 

These equations reduce to 

c, = |£l (fc 

■ 

This result shows that the best linear combination to use is the one in 
which the coefficients are all equal, since Cj. does not depend on k, in 
which case reduces to x. Thus no linear combination of the; sample 
can yield a better unbiased estimate than the sample mean X'. If the 
yariable a: is normally distributed, it can be shown that ^ is not only the 
best linear combination of the sample values to use but the best function 
of any kind to use, that is, ^ minimizes where t' is jany un- 

biased estimate of A proof of this fact is given in the appendix as an 
application of a formula that is derived there to enable one to determine 
whether a particular estimate satisfies definition (22) for being a best 
unbiased estimate. 


9.2.3 Maxiintim Likelihood Estimates 

In Chapter 3 maximum likelihood estimation was introduced, on the 
grounds that it is a popular method for finding point estimates. This 
popularity rests on the ease with which such estimates are usually obtained 
and with the desirable properties that they possess. 



234 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Among the desirable features of maximum likelihood estimates is 
their property of often yielding best estimates. Examples can be found 
for which the maximum likelihood estimate is a poor one; however, for 
most applications it is either a best estimate or very nearly so. 

A second desirable feature of maximum likelihood estimates is their 
excellent large sample properties. If 6 denotes such an estimate, and if 
some mild restrictions are placed upon the frequency function f{x; d), 
it can be shown that the variable 


possesses a distribution approaching that of a standard normal variable 
as w -> 00 . The constant a in the denominator depends onf(x; 0). The 
situation here is very similar to that in 6.6, where it was shown that 
{x — f^) V nja possesses a distribution approaching that of a standard 
normal variable. It is customary to call such limiting distributions 
asymptotic distributions. Thus the maximum likelihood estimate 6 is said 
to be asymptotically normally distributed. The quantity in the 

denominator of (23) is called the asymptotic standard deviation of Now 
it can be shown that among all estimates that are asymptotically normally 
distributed, the maximum likelihood estimate possesses minimum asymp- 
totic variance. Thus, in the sense of possessing minimum asymptotic 
variance, one can say that among all asymptotically normally distributed 
estimates the maximum likelihood estimate is a best estimate. 

It will be found that maximum likelihood estimates are often biased; 
hence, if an unbiased estimate is desired, it may be necessary to multiply 
the maximum likelihood estimate by a constant that depends on such 
as was done with s^, in order to obtain an unbiased estimate. In some 
problems it is not possible to adjust a maximum likelihood estimate in 
this manner. 

The preceding properties are the principal ones that justify the popu- 
larity of maximum likelihood estimation. 


9.2.4 Confidence Intervals 

Thus far, only point estimates of parameters have been considered. 
In many problems of estimation, however, one prefers an interval esti- 
mate that will express the accuracy of the estimate as well. If the sample 
is sufficiently large and the estimate is a maximum likelihood estimate, 
one can use normal curve methods as indicated in the preceding section 



GENERAL PRINCIPLES EOIt TEM^G HY POTHESES ® t) FOR ESTIMATi6n 2235 

to find such an interval ; however, in order to be able to treat more general 
problems, a more general method is needed for constructing interval 
estimates* Such a method, known as the method of confidence intervals^ 
is now described by means of a particular example. 

Suppose that a random sample of size 100 has been taken-, from a 
population that is known to be normal and whose variance iS: known 
to be equal to 16 Suppose, further, that the mean of this sample is 30. 
Then the problem is to estimate the population mean by the use of an 
interval of values of x. Since ==16, = aj's/n = .4. Although 

the value of (j, is unknown, it is known from the theory of Chapter 6 
that for repeated samples of the type being considered x will be normally 
distributed about this value of fx with a standard deviation of Si con- 
sequently, the fixed but unknown interval given hy fjL ± .8 will contain 
95 per cent of such sample means in the long run. Since fji is uhknown 
and is to be estimated, one would be tempted to replace fjL hy x to obtain 
the interval x ± .8 and to make the claim that the probability is !.95 that 
this interval will contain (a,. Such a claim actually is correct if one inter- 
prets this probability in the following manner. 

If the interval x ±, .8 is treated as a variable interval, changing with 
each sample of 100, then in repeated sampling 95 per cent of such in- 
tervals in the long run will contain //. This follows from the facjt that if 
95 per cent of sample means x lie within .8 unit of in 95 per cent of 
such samples // must lie within .8 unit of the corresponding x\ The situation 
is represented geometrically in Fig. 3. 

Each point represents an x based on a sample of 100. The upper 
diagram corresponds to the case in which fx is assumed knowp and a 
probability statement is made concerning ^’s. The lower diagram cor- 
responds to the case in which [x is assumed unknown and the yariable 
intervals x ± .S are plotted. If a point lies inside the 95 per cept band 
of the upper diagram, its interval in the lower diagram must necessarily 
coyer/^, and not otherwise. 1; 

In practice, only one such x is available, so that only the first point and 



Fig. 3. Illustration of confidence interval methods. 



236 


INTRODUCTION TO MATHEMATICAL STATISTICS 


its corresponding interval of 30 ± .8 is available. On the basis of this 
one experiment, the claim will be made that the interval 30 ± .8 contains 
the population mean fji. If for each such experiment the same claim were 
made for the interval corresponding to that experiment, then for these 
experiments 95 per cent of such claims would be true in the long run. It 
is in this sense that correct probability statements can be made concerning 
population parameters. The interval 30 ± .8 is called a 95 per cent 
confidence interval for The end points of a confidence interval for a 
parameter are called confidence limits for the parameter. 

It should be clearly understood that one is merely betting on the 
correctness of the rule of procedure when applying the confidence interval 
technique to a given experiment. It is obviously incorrect to make the 
claim that the probability is .95 that the interval 30 ± .8 contains //. 
The latter probability is either 1 or 0, depending on whether fji does or 
does not lie in this fixed interval. It is only when the random interval 
^ ± ,8 is considered that one can make correct probability statements of 
the type desired. 

The preceding illustration of a confidence interval was discussed from 
a geometrical point of view. In most problems, however, one obtains 
confidence intervals by analytical methods. Thus, for the preceding 
example, one would first write 

P{ \x^iJi\< .8} = .95 
This statement may be written in the form 

P{~.8<^-~/^<.8} = .95 

If the two inequalities are rearranged, the statement becomes 
P\x — .8 < ^ + .8} == .95 

Since x is the random variable here, this statement must be interpreted 
as saying that the probability is .95 that the random interval ^ ± .8 
will contain fi in its interior. If x is replaced by its observed sample 
value, then the 95 per cent confidence interval 30 ± .8 for (jl is obtained. 

The preceding analytical method for finding confidence intervals is 
used extensively in the following chapters for finding confidence intervals 
for the more common statistical parameters. An examination of this 
illustration and those in the following chapters will reveal that the method 
for finding confidence intervals consists in first finding a random variable, 
call iX z, that involves the desired parameter 0 but whose distribution does 
not depend on any unknown parameters. Thus z^x^fx is such a 
variable. Next, two numbers, z-^ and z^, are chosen such that 

< 2 ; < aj} = 1 - a 



GENERAL PRINCIPLE FOR TESTWG HVPOtBlESES ANB FOR ESTIMATI6 n 237 

where 1 — a is the desired confidence coefficient, such as .95, Then 
these two inequalities are solved so that this probability stateijbent as- 
sumes the form 

P{B < e < 0} =1 - a 

where 6 and 0 are random variables depending on z but not invcilving 6, 
Finally, one substitutes the sample values in 0 and § to obtain a numerical 
interval which is then the desired confidence interval. The preceding 
technique does not always lead to a confidence interval because the re- 
arrangement of the probability inequality may not yield an interval. 
It is also clear that any number of confidence intervals can be con^ructed 
for a parameter by choosing and differently each time or by choosing 
different random variables of the z type. The problem of determining 
which confidence interval is the shortest on ihe average in somie sense, 
hence to be preferred, is closely related to the problem of finding best 
tests of hypotheses. If one chooses a random variable ’that is kpown to 
yield a good test for the hypothesis = then the confidence 
interval based on 2 ; will turn out to be a good one also. The random 
variables discussed in Chapter 1 1 for testing hypotheses are of this type; 
hence there will be no discussion concerning the quality of the confidence 
intervals obtained there. 

As pointed out in the preceding discussion, the analytical method for 
finding confidence intervals requires that the proper type of random 
variable z be available. When such a variable is not available, a more 
general method may be employed to construct confidence intervals. The 
method is explained for the case of a continuous variable x whose fre- 
quency function/(a:; 0) depends OH a single parameter 0. 

Let 0* = 0*(iri, • • • , xj be an estimator of 0 that is based on a random 
sample of size n from the population corresponding to J(x; 0), and let 
g(0* ;0) be the frequency function of 0*. Theoretically, at least, this 
frequency function can be determihed when /(a:; 0) is given. Then a 
95 per cent confidence interval for 0 may be constructed in the following 
manner. 

Suppose that 0 is given any value whatever, say 0 == 0o. Since ^(0*; Oq) 
is now completely specified, it would be possible to find two nuipbers 
and Ag such that 

(24) f' = .025 

and 

(25) 




238 INTRODUCTION TO MATHEMATICAL STATISTICS 

These two numbers, of course, would depend on the particular value given 
to 6; therefore, this dependence is indicated by writing and ^2 
functions of 6, namely, h^{6) and h 2 (d). Now, consider the graphs of these 
two functions of 0. A typical pair of such graphs is illustrated in Fig. 4. 

After a random sample of size n has been drawn and the value of 6* 
calculated, draw a horizontal line 0* units above the 0 axis as indicated 
in Fig. 4. If the two functions h^(d) and h^iO) are increasing functions, 
as shown in the sketch, then this horizontal line will cut each curve in 
only one point. Let 0^ and 02 be the abscissas of these points of inter- 
section. Then the interval from 02 to 0i on the 0 axis is the desired 95 per 
cent confidence interval for 0 because of the following considerations. 

Whatever the true value of 0 in f(x; 0) may be, call it 0', it follows 
from the construction of Ai(0) and h 2 (d) as given by (24) and (25) that 

P{/zi(0') < 0* < *2(0')} = .95 

Geometrically, this means that the probability is .95 that the horizontal 
line of Fig. 4 corresponding to the estimator 0* will cut the vertical line 
through 0' somewhere between the two curves. This is the situation 
illustrated in Fig. 4. If this type of intersection does occur, then 0' must 
lie inside the interval (02, 0i), as shown. If this type of intersection does 
not occur, then 0' must lie outside the interval (02, 0i). Since the prob- 
ability is .95 that an intersection of this type will occur, regardless of 
what the true value, 0', may be, the probability is .95 that an interval 
(02, 0i) constructed in this manner will contain the true value 0'. 

The preceding derivation assumed that the two functions *i(0) and 
*2(0) were increasing functions of 0. In most applications this is the case. 
The arguments, of course, apply equally well to decreasing functions 
and for confidence coefficients other than 95 per cent. However, if the 
curves corresponding to *i(0) and *2(®) intersected in more than 
single points by horizontal lines, the construction becomes more difficult 
and the confidence interval becomes a set of intervals. 



Fig. 4. Construction of confidence intervals. 



GENERAL PRINCIPLES FOR TESTING HYPOTHE$E$“ AN0^^TO^^^ 239 

In many problems it is possible to find the confidence limits and 6 ^ 
without explicitly finding the functions Ai(6) and h 2 ( 6 ). It is clear from 
inspecting Fig. 4 that 6 ^ is the value of 6 for which hi( 6 ) = d* and that fig 
is the value of 0 for which /igCfi) = fi*. Thus, replacing fi* in (i24) and 
(25) by t, since fi* is merely a dummy variable of integration there, it 
follows that 01 and fig must be the values of 6 satisfying the equations 


(26) 



; fii) dt = .025 


; fig) dt = .025 


It is often possible to solve these equations for 0i and fig and thus deter- 
mine the desired confidence interval. | 

Although this geometrical method of constructing confidence intervals 
was discussed for the case of a continuous variable, it has proved useful 
for discrete variables as well. To illustrate the application of the! method 
to a discrete variable, consider the problem of finding a 95 per cent con- 
fidence interval for binomial p if a sample of size 50 has yielded the Estimate 
p' = .4. Here fi =/? and 6* = /?' = .4. Since the estimator/?' is a discrete 
random variable, the integrals in (26) must be replaced by the appropriate 
sums; hence the confidence limits /?2 and pi must satisfy the equations 


2 «’(<; Pi) = -025 
(27) 

2 ^2) = -025 

<=•4 


where |^(/;/>) is the frequency function for the estimator t =i p’ and 
the values of t for which terms exist are values given by t = xjM (x = 
0, 1, • • • , 50). Now, because of this relationship. 


p{x = k} = P{t = kjSO} = gikISO; p) 


consequently, one can just as well work with die binomial variaHe a; as 
with the variable p'. If equations (27) are expressed in terms of tiinomial 
a:, they will become 


50 ! 50 -* 

fc^ofc!(50 - k)! ^ 


50! 


so 

2 

ft =20 fc! (50 - fc)! 




.025 

.025 


(28) 



240 INTRODUCTION TO MATHEMATICAL STATISTICS 

It is possible to solve these equations by trial-and-error methods; 
however, tables are available for sums of binomial probabilities to assist 
one in the solution. Such tables yield the values ^ .26 and = .55, 
which are therefore the desired confidence limits for p. 

Although the variable /?' is a discrete variable, the problem was solved 
as though the arguments for a continuous variable that led to equations 
(26) were applicable to discrete variables also. Since the integrals in (24) 
and (25) become sums for a discrete variable, it is not possible in general 
to find numbers h^{p) and h^ip) such that these sums exactly equal .025 
for all values of p. It is customary, therefore, to choose h^{p) and h^{p) 
in such a manner that the corresponding sums come as close as possible 
to but do not exceed .025. With this understanding, the arguments for 
the continuous case will show that the probability will be at least .95 
that the confidence interval constructed in the foregoing manner will 
contain p. 


REFERENCES 

Additional material on the testing of hypotheses and on estimation may be found in 
the following books: 

Neyman, J., First Course in Probability and Statistics, Henry Holt and Co. 

Dixon and Massey, An Introduction to Statistical Analysis, McGraw-Hill Book Co. 
Mood, A. M., Introduction to the Theory of Statistics, McGraw-Hill Book Co. 

The derivation of the correction to the likelihood ratio test for testing the homogeneity 
of a set of variances, although difficult, may be found in M. S. Bartlett, “Properties of 
Sufficiency and Statistical Tests,” Proc, Royal Soc. London, Series A; 160^ pp. 273 ff. 


EXERCISES 

1. Suppose that you are testing ITqIju =2 against Hiiju = 1 for the Poisson 
distribution by means of a sample of size 2. Indicate by means of a sketch in 
the sample space ojg the part of the sample space you would choose for the 
critical region. Give a justification for your choice. 

2. Estimate the size of the type II error if the type I error is chosen to be a = 
.16, if you are testing HqIju = 7 against Niiju =6 for a normal distribution with 
or = 2 by means of a sample of size 25, and if the proper tail of the x distribution 
is used as critical region. 

3. In testing =20 for a normal distribution, what is the probability that 
you will accept IIq when the mean is actually 0/2 units above 20, if a sample of 
size 9 is taken, and the critical region is chosen as the two 2^ per cent tails of the 
X distribution ? 

4. Under the 3 possible hypotheses Hi, and a discrete random variable 
X has the following distributions : 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AN0 FOR ESTIMATION 241 


X 

1 

2 

3 

4 

5 

6 

7 

8 


10 

fix 

\Hd 

0 

.58 

.02 

.05 

.03 

.11 

.01 

.07 

.04: 

.09 

fix 

\H^ 

.60 

0 

.06 

.08 

.03 

.01 

.04 

.12 

.02;' 

.04 

fix 

1 fls) 

.54 

0 

.10 

.03 

.12 

.06 

.04 

.01 

.08; 

.02 


(a) Choose a = ,10 and find a best critical region for testing against JTg. 
(Jb) Deterniine whether there is a best critical region of size a = .10 for testing 

against both i/a ^3* 

5. Forty pairs of runners have been matched with respect to ability. Each 
member of a pair is given a pill, with 1 member receiving a stimulant in his pill. 
Races are run between each pair. Let a? denote the number of races won by the 
individuals who received the stimulant. Construct a best test for testing the 
hypothesis Hq\p = ^ against H^\p > where/? is the probability that :a stimu- 
lated runner will win an evenly matched race. Choose a critical region that niakes 
a as close to .10 as possible. Calculate the power of this test for/? = J$. 

6. Graph the power function, by plotting a few points on it, for testing the 
hypothesis = 0 when using the two per cent tails of the x distribution as 
critical region, given that x is normally distributed with o* = 1 and that a sample 
of size 4 is used. 

7. If X is normally distributed with a = 10 and it is desired to test Hq : j“ = 
against Hi ip ~ 110, how large a sample should be taken if the probability of 
accepting Hq when Hi is true is to be .02 and if a critical region of size .0^ is used? 

8. By means of the Neyman-Pearson lemma, prove that the best test for 
testing the hypothesis ^ = against = or^ > for x normally dis« 
tributed with 0 mean is given by choosing as cntical region the region where 

n ...... 

> c, where c is the proper constant. 

1 . ■ ■ ■■ ;■ 

9. Use the Neyman-Pearson lemma to determine the nature of a besf critical 

region based on a sample of size n for testing Hq\B = 6^ against Hi’.d Bi < 0^ 
if f{x\ 0) =(14- 0 ^ a? ^ 1. 

10. Can the Neyman-Pearson lemma be applied to testing jFTq :0 = I against 
Hi'.O = 2 if/(£c; 0) == 1/0, 0 ^x and a sample of one is to be taken? 

11. Given /(a;;/?) =pq^^p = 1 Md a best test based on a sarnple of 

n for testing H^iq — against Hi.q = Is this test also best for 

12. Given that x is normally distributed with mean 0 and variance find the 

expression for A for the likelihood ratio test for testing Hq\g == 1. ;■ 

13. Wprk problem 12 if the mean is rather than 0, with p unknown. 

14. Construct a likelihood ratio test for testing : 0 = 1, given that ; 0) = 
X '^ 0. Carry your solution to the stage of obtaining A as a function of x^ 

15. Construct a likelihood ratio test for testing H^.p — Pq by means of n 











242 


INTRODUCTION TO MATHEMATICAL STATISTICS 


observations of a binomial variable with probability p. Is this a best test for 
some alternative value ? 

16. Work problem 15 if JVsuch experiments are carried out with numbers of 

successes a?!, • • • , Xy, 

17. Construct a likelihood ratio test for problem 9. 

18. Given the following 5 sample variances based on 10 observations each, 
test the hypothesis that the 5 population variances are equal. The sample 
variances are 22, 40, 30, 32, 12. Assume normal samples. 

19. Using the fact that —2 logg A possesses an approximate distribution 
with 1 degree of freedom for the likelihood ratio test of problem 14, use the 
result of problem 14 and the following sample values to test the hypothesis 
Hq\6 = 1. The sample values are 1.5, 2, .8, 1.3, 2.8, .9, 1.6, .6, 4.2, 3.1, 1.4, 2.2, 
.7, 1.6, .8. 

20. Given that x is normally distributed and given the following 3 sample 

values, {a) combine these 3 variances to yield an unbiased estimate of and 
{b) show that {Is^ + 2s^ + s^)j5 is not an unbiased estimate of The sample 
values are s-^ = 12, = 10, = 14, with = 10, = 10, Wg = 5. 

21 . Using the expected value operator, derive an expression for the correlation 

between u = + ■ • • + and v = + * • • + b^pc^^, where the a’s and 

6’s are constartts and the variables * * * , are independently distributed. 

22. Consider the variable z ^ {a^x^ ^ ^ + * * * + a^), where the 

variables * * * , are independently distributed with 0 means and variances 

' ’ ' » Prove that the variance of z will be minimized if the weight ai is 
chosen inversely proportional to 

23. Given that = I, = 2, = 3, = 4, == 5, calculate the 

variance of z in problem 22 when (a) ai is chosen inversely proportional to 

(b) Ui is chosen equal to 1 jk. (c) Comment on the advantage of the weighting 
used in (a). 

24. Show that 2x is an unbiased estimate of 6 for f(x; &) = 116,0 x ^ 6. 

25. Show that the distribution function of z = rnax {a?!, iCg* ’ ’ ’ > ^n} is given 
by (z/6)”, when x has the distribution given in problem 24. In this connection 
see problem 41, Chapter 6. Use the preceding result to show that (1 + lln)z is 
also an unbiased estimate of 0 in problem 24. 

26. Compare the variances of the two estimates of 6 obtained in problems 
24 and 25. 

27. A fisheries investigator catches fish from a lake until he has obtained x 
fish of a certain species. His total catch then is N. Assuming that the lake has a 
very large number of fish, show that the frequency function of the variable N is 

given by N = x,x + 1, • * * , where p is the proportion 

of this species in the lake. Use this result to show that (x — \)1{N — 1) is an 
unbiased estimate of p, and that xjN is a biased estimate. 

28. Find the maximum likelihood estimator of p for a binomial distribution 
based on a total of n trials. 

29. Find the maximum likelihood estimator of p for a binomial distribution 
based on N experiments of n trials each and with successes x^, • • , Xy. 



GENERAL PRINCIPLES FOR TESTING HYPOTHESES AND FOR ESTIMATlbN 243 

30. Find the maximum likelihood estimator of for the frequency 

/(a;;0) = (InO^) % H o 7 

31. Find the joint maximum likelihood estimators of ^ and for | normal 
distribution. 

32. Given f(x) = where a is a given positive integer and c is a 

constant depending on a but not jbi, what steps would be required to find the 
maximum likelihood estimator of ? 

33. Find the maximum likelihood estimate of q for /(x; p) == pq^, p = 1 — 
iK = 1, 2, • • ' , if « experiments yielded the observations x^^ • * • , x^^ 

34. Given /(a? ; 0) = x ^ 0, (a) find the maximum likelihood estimator 
for 6 and (6) find the maximum likelihood estimator for the mean vatu^^ of x. 

35. Show that the situation occurring in problem 34 is typical, nardely, that 
the maximum likelihood estimator for a parameter of a frequency function is the 
same as the estimator of that parameter When one expresses the pafa|meter in 
terms of the m^an value of a; and finds the maximum likelihood estimator of the 
latter parameter. 

36. Show that the maximum likelihood estimator of pi in a muitinomiaf 
frequency function is given %y Pi «= «£/«, where rii is the observed frequency in 
the /th cell. 

37. Find an 80 per cent confidence interval for the mean of a normal distribu- 
tion if (7 = 2 and if a sample of size 8 gave the values 9, 14, 10, 12, 7, 1 3, 11, 12. 

38. Assuming that n is large enough to j ustify the use of the normal approxima- 
tion to the binomial distribution, show that a 95 per cent confidence interval 
for binomial is given by p^ < p < /? 2 , where and /?2 are solutions of the 
quadratic equation (/7 — /^^Xl. 96)2 = w(/?' 

39. A lake contains A fish. A netting experiment yielded 4? fish, which were 
marked and released. A second experiment yielded 2 / fish, of which 2 were 
found to be marked. If y is small compared with A, show that the ntaximum 
likelihood estimate of A is given, approximately, by T?" = xyjz. 

40. Given that £c is normally distributed with cr — 1, use the general method 
for finding confidence intervals to find a confidence interval for p if ^ 4 10 and 
w = 9; that is, construct a diagram similar to Fig. 4. 

41. Assume that x possesses a Poisson distribution with unknown mean 

If 10 observations yielded the values 20, 23, 17, 16, 21, 22, 19, 19, 25, 18|i find an 
approximate 90 per cent confidence interval for Use a normal approximation, 
and base your interval on the sample mean only. 

42. Apply the general method for finding confidence intervals to find a 90 per 
cent confidence interval for 0 in f{x; B) = (1 + B)x^, 0 £c ^ 1, if only the 
single observed value a? = .8 is available. 



CHAPTER 10 


Testing Goodness of Fit 


A problem that arises frequently in statistical work is the testing of 
the compatibility of a set of observed and theoretical frequencies. For 
example, if Mendelian inheritance suggests that four kinds of plants 
should occur in the proportions 9 : 3 : 3 : 1 and if a sample of 240 plants 
yielded 120, 40, 55, 25 in the four categories, one would like to know 
whether these frequencies are compatible with those expected under 
Mendelian inheritance. 

This type of problem has already been discussed and solved for the 
special case in which there are only two pairs of frequencies to be com- 
pared. Then the binomial distribution may be applied as shown in the 
first illustration of 5.3.4.5. When more than two pairs of frequencies 
are to be compared, the multinomial distribution, which was derived in 
5.4.2, is needed. 


10.1 The Test 

The problem that is being considered here can be formulated quite 
generally in terms of the notation that was introduced in 5.4.2, In this 
connection, consider an experiment in which there are k mutually exclusive 
possible outcomes A^, Aj^, Let be the probability that event 

Ai will occur at a trial of the experiment and let n trials be made. The 
number of trials producing outcome Ai will be denoted by rii. Since 
is a binomial variable with probability with respect to the single out- 
come Ai, the mean, or expected value, of is given by 

Bi = E[n,\ = npi 

In terms of this notation, the problem is to determine whether the ob- 
served frequencies are compatible with the expected fre- 
quencies ^ 1 , ■ ’ ’ > 

An analysis of the preceding discussion will show that the problem is 



TESTING GOODNESS 01 HT J 245 

really one of testing a hypothesis because it is assumed that the multi- 
nomial distribution is tM ^ centers on 

whether the postulated /?’s are correct. Thus the problem can be treated 
as a problem of testing the hypothesis 


(1) /fo:/’i=Ao. * = i ^ , 

: ii" ■ 

where the are the postulated values of the probabilities of a multi- 
nomial distribution. _ . n . - 

The hypothesis expressed in (1) is a simple hypothesis, but unless 
alternative values of the p"s are specified the alternative hypp|hesis is 
composite. As a result, lemma (2) of Chapter 9 is not applicable; con- 
sequently the likelihood ratio test is the natural test to employ here. Now 
it will be found that the expression for X in this test is so complicated that 
it is not feasible to find its distribution; therefore only the large sample 
approximation given by theorem (12), Chapter 9, for a general likelihood 
ratio test is ordinarily used. 

If the various steps involved in evaluating A in (8), Chapter 9, are 
carried out, it will be found that , 

(2) -21og, A = 2i 

^ 

Now according to theorem (12), Chapter 9, this quantity possesses an 
approximate distribution when n is large. The number of degrees of 
freedom here is given by v = k -- 1 because the multinomial distribution 
is determined by only k— 1 parameters in view of the restriction that 

2/^1 = 1. The test of hypothesis therefore consists in chodsing as 
1 

critical region the right tail of the distribution with A — 1 degrees of 
freedom. ^ 

Although (2) does yield a valid large sample test for the hypothesis 
(1), this test is not the one that is customarily employed here. A rnpdifica- 
tipn of it, which is based on approximating the right side of (2), is more 
comhionly used. This approximation is obtained by expanding the 
logarithms and retaining only the dominating terms in much the same 
manner as in the derivation of Theorem 3, Chapter S. Since all other terms 
converge to zero as n <X), the results of such manipulations can be 
expressed in the form of a theorem. 

Theorem : Jf ‘ ‘ ohseried and 

expected frequencies, respectively , for the k possible outcome^ of an 



246 


INTRODUCTION TO MATHEMATICAL STATISTICS 


experiment that is performed n times, then, as n becomes infinite, the distri- 
bution of the quantity 

(3) 2 

i^l 

will approach that of a variable with A: — 1 degrees of freedom. 

The test procedure here is the same as for the test based on (2). Thus, 
after calculating the value of the quantity given by (3), one determines 
whether this value exceeds the critical value that is obtained from the 
table of critical values of the x^ distribution given in Table III in Appendix 
2. Although this test was derived here as an approximate likelihood ratio 
test, it was obtained by other methods many years before likelihood ratio 
tests were introduced. Since statisticians were already familiar with the 
preceding theorem when the test based on (2) was introduced, they con- 
tinued using the test based on (3), known as the x^ i^st for goodness of fit. 

The derivation of the theorem given in connection with (3) as an appli- 
cation of theorem (12), Chapter 9, is given in the appendix. 

As a simple illustration of how to apply this theorem, consider a 
typical problem. Suppose that a gambler’s die is rolled 60 times and a 
record is kept of the number of times each face comes up. If the die is 
an “honest” die, each face will have the probability J of appearing in 
a single roll. Therefore, each face would be expected to appear 10 times 
in an experiment of this kind. Suppose that the experiment produced 
the following results, where the row labeled represents the observed 
frequencies and the row labeled represents the expected frequencies. 


Face 

1 

2 

3 

4 

5 

6 


15 

7 


11 

6 

17 

ei 

.10 . 

.....iP...... 

10 

10 

10 

JO 




As explained in an earlier paragraph, the problem is one of testing a 
hypothesis about a multinomial distribution, namely, the hypothesis 

iTo :^i = • * * = /?e ” 6 

Since v = k — \ and k = 6 here, r = 5. If a critical region of size .05 
is chosen, it will consist of those values of the approximate x^ variable 
given in (3) that exceed the value Xo^ which cuts off 5 per cent of the right 






TESTING GOODNESS OF FIT 


TAl 

tail of the distribution with five degrees of freedom. From /table III 
it will be found that = ll-l- Now calculations show that 

« (n, - eif (15 - 10)^ (7 - 10)^ (4 - 10)^ (11 - to)" 

e , 10 10 10 lo:; 

(6^ (17_^^j3, 

10 10 

Since this value exceeds the critical value = 11.1, it lies in the critical 
region and therefore the hypothesis is rejected. Thus one would 
conclude that the gambler’s die is “dishonest.” The error mtro|iuced in 
using the approximate distribution here would be very small because 
n is feirly larger consequently the test based on the theorem in (3) may 
be applied with confidence to this problem. 


10.2 Limitations on the Test 

Since the x^ distribution is only M approximation to the exact distri- 
bution of the quantity care must be exercised that the 

test is used only when the approximation is good. Experience aiid theo- 
retical investigations indicate that the approximation is usually satisfactory, 
provided that the ^ 5 and k'^ 5. If k < 5, it is best to have the 
somewhat larger than 5. This limitation is similar to that placed on the 
use of the normal curve approximation to the binomial frequency func- 
tion in which np and nq were required to exceed S, 

If the expected frequency of a cell does not exceed 5, this celK should 
be combined with one or more other cells until the above con|iition is 
satisfied. For example, suppose that the gambler’s die of the preceding 
section had been rolled only 24 times and the following results hW 
obtained: 


Face 

1 

2 

3 

4 

5 

6 

ni 

6 

5 

2 

3 

0 

8 

Ci 

4 

4 

4 

4 

4 

4 


Here none of the expected frequencies exceeds 5; therefore, it is necessary 
to combine each cell with some other cell. If successive pairs of cells are 



248 


INTRODUCTION TO MATHEMATICAL STATISTICS 


combined, the preceding empirical rule will be satisfied and the following 
table of values will be obtained: 


Face 

1 or 2 

3 or 4 

5 or 6 

ni 

11 

5 

8 

ei 

8 

8 

8 


The application of the test will now yield a value of = 2.25 with 
V = 2. From a theoretical point of view it is legitimate to combine cells 
in any desired manner, provided that one is riot influenced by the observed 
frequencies. In many applications, however, there are practical reasons 
for combining neighboring cells as in the preceding impractical illustra- 
tion. 


10.3 Applications 

In experiments on the breeding of flowers of a certain species, an 
experimenter obtained 120 magenta flowers with a green stigma, 48 
magenta flowers with a red stigma, 36 red flowers with a green stigma, 
and 13 red flowers with a red stigma. Theory predicts that flowers of 
these types should be obtained in the ratios 9:3:3: 1. Are these experi- 
mental results compatible with the theory? 

This is a problem of testing the hypothesis 

“ 1^5 P2 ~ 1~6> jp3 “ A? ~ 

for a multinomial distribution involving four cells and for which n = 217. 
Under Hq, the expected frequencies, Correct to the nearest integer, are 
those in the second row of the following table. 


ni 

120 

48 

36 

13 

ei 

122 

41 

41 

14 


Calculations give 

2 (120 - 122)2 . (48 - 41)2 ^ _ 41)2 _ (13 _ ^ ^ 

y ^ q -j- l.y 

^ 122 41 41 14 

From Table III the 5 per cent critical value of for three degrees of 
freedom is = 7.8; consequently the result is not significant. The 



TESTING GOODNESS or FfT 249 

hypothesis //q is acceptable here and thus there is no reason on the basis 
of this test for doubting that the theory is applicable to these data. 

As a second application, consider the following prdblern. *On the 
basis of extensive experience with trainees, a training station determined 
four scores in marksmanship so that equal numbers of trainees would be 
located in the resulting five categories of skill. A new group of 200 
trainees is given the marksmanship test with the following results i 


Category 

I 

II 

III 

IV 

V 

rii 

54 

44 

40 

35 

27 


40 

40 

40 

40 

40 


If the five categories are listed according to increasing ability, would you 
be justified in claiming that the 200 trainees represent ^^a^ group 

of trainees with respect to marksmanship? This problem may be treated 
as a problem of testing the hypothesis 

Ho-Pi = ‘: - =P^ = i 

for a multinomial distribution with n =5 200. Calculations give = lO.l. 
Since x^ = 9.5 for v == 4, this result is significant, hence one is justified 
in claiming that the new group of trainees is not typical of past trainees. 
Because of the excess frequencies at the lower end of the scale, the new 
trainees undoubtedly are inferior marksmen. 

... M --- ■ ■■■■• • . 

' . I.-- . , 

■ ; ■ ■ . . ■*!•■•■■■ ■ 

10.4 Generality of the Test 

In the preceding applications the expected frequencies for the Wrious 
cells were known because the cell probabilities were assumed known; 
however, many applications involve situations in which the cell prob- 
abilities are functions of some unknown parameters. For example, 
suppose that one is interested in studying the sex distribution of children 
in families having eight children. If it is assumed that the probajbility is 
p that a child selected at random from a family with eight children will be 
a son, and if N such families are selected then the expected free^iieneies 
for the nine cells corresponding to 0, 1, 2, • • * , 8 sons wifi be given by 
the successive terms in the expansion of the binomial + Here 
the probabilities for the various cells depend on the unknown parameter 
p. Except for crude work, the difficulty cannot be overcome by assuming 
that the two sexes are equally divided because experience has sho^n that 
p is slightly larger than 



250 INTRODUCTION TO MATHEMATICAL STATISTICS 

Fortunately, the test possesses a remarkable property that permits 
it to be applied even when the cell probabilities depend on unknown 
parameters as in this problem. This property, although very difficult 
to prove, is very simple to state. It may be expressed as follows. 

(4) Property: The test is applicable when the cell probabilities depend 
on unknown parameters^ provided that the unknown parameters are re~ 
placed by their maximum likelihood estimates and provided that one degree 
of freedom is deducted for each parameter estimated. 

It is assumed, of course, that the cell frequencies are large enough to 
justify the use of the regular test. Since — 1 when there are k 

cells and the cell probabilities are known, it follows that r = A: -■ 1 — / 
when the cell probabilities depend on / parameters. The preceding prop- 
erty enables the x^ test to be applied to a wide variety of problems in- 
volving the comparison of observed and expected frequencies. Some of 
these applications are considered in the next few sections. 


10,5 Frequency Curve Fitting 

If a theoretical frequency function has been fitted to an empirical 
frequency function, the question whether the fit is satisfactory naturally 
arises. This question was asked, for example, in the exercise on fitting a 
normal curve to a histogram in Chapter 5. When a normal curve is fitted 
to a histogram, it is usually assumed that the data represent a sample 
selected at random from a normal population and that the fitted normal 
curve is an approximation to the population curve. Thus, the question 
whether a fit is satisfactory can be answered only if one knows what sort 
of histograms will be obtained in random samples from a normal popula- 
tion. 

Now, the x^ tsst can be employed to give a partial answer to this 
question. Since the x^ I^st is concerned only with comparing sets of 
observed and expected frequencies, it is capable of testing only those 
features of the fitted distribution that affect a lack of agreement in the 
compared sets of frequencies. For example, the x^ I^st is not capable 
of distinguishing between the two curves shown in Fig. 1, in which the 
X axis has been divided into six intervals to give six cells for the x^ t^st 
and in which the areas under the two curves for each of the six intervals 
are equal. 

With this understanding of the capabilities of the x^ t^st, consider 
the problem of testing the adequacy of the normal curve fit in Table 
2, Chapter 5. The frequencies labeled theoretical frequencies were 



T ESTING GOODNESS OF HT 


251 



Fig. 1. Two equivalent frequency functions. 


obtained by integrating the fitted normal curve over the successive class 
intervals of the histogram. The fitted normal curve was obtained by 
replacing the parameters and a by their sample estimates x arid s. If 
these frequencies are treated as the expected frequencies in the test, 
then the problem of comparing the observed and expected frequencies 
is the type discussed in the preceding section because the cell probabilities 
depend on the two parameters [Jt and o'. Since ^ and s are the maximum 
likelihood estimates of fjL and o, the property stated in (4) permits the 
application of test, provided that one chooses v =10—1 2 = 7- 

Calculations here yield = 10.4. Since x^ ^ ^ 

hypothesis that the data were obtained from sampling a normal popula- 
tion is substantiated, as far as compatibility of corresponding pairs of 
frequencies is concerned, and so the fit in Fig. 6, Chapter 5, would be 
considered satisfactory from this point of view. 

Since >10 = 2.2 does not satisfy the empirical rule in 10.2, requiring 
all Ci > 5, one should combine the last cell with, say, the next to |ast cell 
before applying the test; however, it is obvious that this procedure will 
not alter the conclusions here. • 

If a binomial distribution is fitted to an empirical frequency distribu- 
tion by estimating the two parameters p and n mf {x) = « ! /[a; ! {n — a;) ! ] x 
pXqn~x fj-om the data, the number of degrees of freedom in the will 

be V = fc — 1 — 2 = A: — 3 just as in normal curve fitting; however, it 
often happens in binomial problems that one or more of the parameters 
will be specified from other considerations. For example, suppose that 
one were interested in studying the sex distribution in families pf eight 
children. Here >7 = 8 is known; hence it is not obtained as a maximum 
likelihood estimate from the data. Consequently the number of degrees 
of freedom would be A: — 2. If one were to assume that > = | rather 
than estimate p from the observations, the number of degrees of freedom 
would be — 1. 

Since the fitting of a Poisson distribution involves only the parameter 



252 


INTRODUCTION TO MATHEMATICAL STATISTICS 


fjL, the test will possess fc — 2 or — 1 degrees of freedom, depending 
on whether fjL is replaced by x or is known from other considerations. 

Property (4) of the x^ test requires that the unknown parameters be 
estimated by the method of maximum likelihood using the likelihood 
function • • * p^^^, corresponding to the multinomial distribution. 

If one calculates the maxirnum likelihood estimates of fx and a from this 
likelihood function when fitting a normal curve to a histogram, the 
resulting estimates will not be exactly equal to x and s, which are the 
maximum likelihood estimates of (x and g for ungrouped data; however, 
they will ordinarily be very nearly the same. Thus, although theoretically 
speaking one should calculate the maximum likelihood estimates for the 
multinomial situation, it suffices to use the well-known maximum like- 
lihood estimates for the continuous situation. 


10.6 Contingency Tables 

Another very useful application of the x^ ^^st occurs in connection 
with testing the compatibility of observed and expected frequencies in 
two-way tables. Such two-way tables are usually called contingency 
tables. Table 1, in which are recorded the frequencies corresponding to 
the indicated classifications for a sample of 400, is an illustration of a 
contingency table. 

A contingency table is usually constructed for the purpose of studying 
the relationship between the two variables of classification. In particular, 
one may wish to know whether the two variables are related. By means 
of the x^ f^st it is possible to test the hypothesis that the two variables 
are independent. Thus, in connection with Table 1, the x^ f^st can be 
used to test the hypothesis that there is no relationship between an indi- 
vidual’s educational level and his adjustment to marriage. 

Table 1 

Marriage-Adjustment Score 



j Very low 

Low 

High 

Very high 

Totals 

College 


29 (39) 

70 (64) 

115(102) 

232 

High school 

17(13) 

28 (19) 

30 (32) 

. 41 (51) 

116 

Grades only 


10(9) 

11 (14) 

20 (23) 

52 

Totals 

46 

67 

111 

176 

400 















TESTING GOODNESS OF HT 


253 


Before considering how the test may be applied to this particular 
problem, consider a general contingency table containing r ro^^s and c 
columns. Let be the probability that an individual selected at'random 
from the population under consideration wHl be a member of 
the iih row and 7th column of the contingency table. Let Pi be tKe prob- 
ability that the individual will be a member of the /th row and let be 
the probability that the individual will be a member of the yth column. 
Then the hypothesis that the two variables are independent can be written 
in the form li 

Ho' Pa = 

J = • V* t 


If a sample of n individuals is selected and of them are found in the 
cell in the fth row andyth column, then as defined by (3) will assume the 
form ; 



But under the hypothesis i/p, this expression will become 


(5) / = i i - ■ " 

t=i3=i nPi.p.j ;■ 

Since the ^^. and 7?.^ are unknown, it is necessary to estimate them from 
the sample. If the estimates are maximum likelihood estimates, the 
theory discussed in 10.4 will permit test to be applied here, provided 
that one degree of freedom is deducted for each parameter so estimated. 

r c ;; 

Since ^Pi. = 1 and i == h there are r — 1 + c — I = r Hjr c — 2 
1. . 1 ; . 

parameters that need to be estimated; hence the proper number of 
degrees of freedom for testing independence in a cb table of r 

rows and c columns is given by r = k — 1 •“ I ^ rc — 1 — (r 4- c 2) = 
(r - l)(c - 1). I 

In order to complete the discussion, it is necessary to find the maxi- 
mum likelihood estimates of the and For this purpose let n^. de- 
note the sum of the frequencies in the ith row and let n.^ denote the sum 
of the frequencies in the yth column. Since the variables are discrete, 
the likelihood of the sample is the probability of obtaining the sample 
in the order in which it occurred. Thus the likelihood of the sample is 
given by 

. . 

i = 1 i=1 



254 INTRODUCTION TO MATHEMATICAL STATISTICS 

But, because of and the definitions of and this will reduce to 


L = TiTi(PiP-;r^‘ 

= n ri IT/’”" 

i=lj=l 

c r 

.S na c S 

= UpT" 

i=l j=l 

=n/’^n/’r 

i=l j=l 

Before differentiating L with respect to p^. for maximizing purposes, it 
is necessary to express one of the p^’s, say p^., in terms of the remaining 

r 

ones through the relation ^Pi. = l. If this is done, L will assume the 
form 

( r-1 W. r-1 c 

1-2 a ) UPi^^ Upr 

1 / i=l 3=1 

Taking logarithms, 

( r — l \ r~‘l 

1-2 Pi j + 2 ”i- Pi. + K 

where AT does not involve the variable p^.. Now, differentiating with 
respect to p^. and setting the derivative equal to 0 for a maximum, 


9jogL_ 

d Pi. 


^r- j (V _ Q 

r-1 ^ 

1- I Pi- 
1 


Since 1—2 Pi- ^ this equation is equivalent to 
1 

Pi, = — rti. = Arii. 

where k does not depend on the index L Since this must hold for / = 1, 
2, • • • , r and since 

1 = 2/’^ = 'i2”»- = -1" 

1 1 

it follows that A = l/« and that the maximum likelihood estimate of 
Pi. is 

Pi.=^ 

n 



TESTING GOODNESS OF HT “ 255 

By symmetry the maximum likelihood estimate of p.^ is 


If Pi. and p.^ in 
will become 


( 6 ) 

n 

According to the theory of 10,4, this quantity may be treated as possessing 
a distribution with (r — l)(c — 1) degrees of freedom, provicjed that 
« is sufficiently large and iiifo is true. 

Now, consider the application of (6) to testing independence ip Table 
1. To calculate the values of the rii.n^. jn in the /th row, it is merely neces- 
sary to multiply the column totals «./by the fraction 'thus the 

values of rii.n.ijn for the first row of Table 1 are obtained by multiplying 
the column totals by 232/400 and similarly for the remaining rowsi These 
values, correct to the nearest integer, are inserted in parentheses in 
Table 1 , The calculation of is now like that for (3), with the values in 
parentheses treated as the It will he found that = 20.7, Since 
Xo^ = 12,6 for (3 — 1)(4 — 1) =s= 6 degrees of freedom, this result is 
significant and the hypothesis Hq of independence is therefore rejected. 
An inspection of Table 1 shows that individuals with some college educa- 
tion appear to adjust themselves to marriage more readily than those with 
less education. ;i 


10,7 Indices of Dispersion 

It frequently happens that an experimenter has a set of data .that he 
believes can be treated as having been obtained from sampling a binomial 
population, or possibly a Poisson population, but which contains so few 
values that it is useless to attempt to fit a binomial, or Poisson, distribu- 
tion to the observed distribution. In such situations one can of|en test 
the hypothesis that the data came from a population of the assumed 
type by testing whether the sample variance is compatible with the theo- 
retical variance. This test can be obtained as a slight modification of the 
X^ test for contingency tables. 

Let arg, • • * , Xj^ represent the number of successes for A: saipples of 
n trials each taken from the same binomial population. Then n — 0 ?^, 


= 


n.i 


5) are replaced by their maximum likelihood estirnates, 

( 

\ n 


2 

i = l 





256 


INTRODUCTION TO MATHEMATICAL STATISTICS 


•**,« — will represent the corresponding failures. These two 
sets of numbers may be arranged in the following two-way table: 



Oil 



n — X, 

1 



n-xj, 


If this table is treated as though it were an ordinary contingency table 
and if the technique used to arrive at the maximum likelihood estimates 
for the Ci, as in (6), is used, the estimates for the in the first row will 
be given by 

k 

nk 


As a consequence, the estimates for the in the second row will become 
n — X, With these estimates, the value of as given by (6) will reduce to 


( 8 ) 


1 


x) 



2 ^ (x 

1 n — X 

xj 1 


2 (*« - *)* 


1 



The contingency table on which this result is based differs slightly from 
the ordinary contingency table treated previously. For the ordinary 
table, successive observations are free to fall in any one of the cells; 
however, for this table the first n observations must fall in either of the 
two cells of the first column only, the second n observations must fall in 
either of the two cells of the second column only, etc. The general theory 
of 10.4 shows, however, that the x^ t^st as applied in (6) is applicable to 
this modified type of contingency table also; hence (8) may be assumed to 
possess a x^ distribution with fc — 1 degrees of freedom. 

It will be observed that the numerator in (8) is k times the sample 
variance. For a binomial distribution the variance may be expressed in 
the form 



TESTING GOODNESS OF FIT 

If fji is replaced by the sample mean, it will be observed that the |enomi- 
nator in (8) is a second sample estimate of the variance. Thus ts essen- 
tially fc times the ratio of two sample estimates of the binomial variance. 
If the Xi are from different binomial populations rather than from the 
same binomial population, there will be a tendency for the numerator 
estimate to be large relative to the denominator estimate and thus: to give 
rise to a significant value of Thus the x^ test essentially tests the 
hypothesis that the data came from the same binomial population by the 
device of checking on the variability of the data. Because of this property 
of the test, the expression (8) is called the binomial index of dispersion. 

As an illustration of the application of (8), consider the following data 
giving the number of infected plants per plot of 90 plants, for 12 plots: 
19, 6, 9, 18, 15, 13, 14, 15, 16, 20, 22, 14. The problem here is to deter- 
mine whether it is reasonable to assume that the rate of infection is the 
same over the 12 plots. This problem may be treated as the problem of 
testing the hypothesis 

where denotes the probability that a plant selected at random from 
the ith plot win be infected. Calculations applied to (Sy give, to the in- 
dicated accuracy, 

^^15,1, |(:r,~^)2 = 223, /= 18 
1 

For 11 degrees of freedom, ;^o^=19.7; consequently, Hq would be 
accepted here. Since = 18 is so close to the critical value anid since 
the sample is so small, one would be tempted to suspend judgment here 
until more data became available. For data of this type, it often happens 
that the infection is localized and gradually spreads from such iopalized 
centers of concentration. If such were the case, one would expect the 
hypothesis of homogeneity to be rejected because some plots would have 
a high rate of infection whereas others might still be largely untouched 
by the infection. r 

If the value of p is very small and the value of « is very large, the value 
of which is the sample estimate of /?, will be very small; consequently, 
the value of 1 — xjn will be very nearly equal to one. If this approxima- 
tion is used in (8), the binomial index of dispersion reduces to y/hat is 
known as the Poisson index of dispersion^ namely, 

. xf 



258 INTRODUCTION TO MATHEMATICAL STATISTICS 

It would appear that the Poisson index is merely a special case of the 
standard test given in (3) for those situations in which the expected 
frequencies are equal; however, there is a distinction in the nature of 
the variables. The sum of the frequencies in the ordinary test represents 
the total number of observations made, whereas in applications of the 
Poisson index there are but k observations, each observation yielding a 
result that happens to be an integer. It is important to distinguish between 
these two types of problems in order to avoid the mistake of applying 
the ordinary test to the first row only of the binomial frequencies in 
(7). Such an application would be equivalent to assuming that the data 
came from a Poisson rather than a binomial population. 

As an illustration of the application of the Poisson index, consider the 
problem of testing whether the following data on the number of defective 
parts found in samples of 1000 parts each are homogeneous: 15, 13, 8, 
6, 11, 9, 14, 10, 16, 9, 12. Since the probability of a part being defective 
is very small and n is very large, these frequencies may be treated as having 
come from Poisson populations. The problem now is one of testing the 
hypothesis 

-^ 0*^1 = * * ’ = 1^11 

where is the mean of the Poisson population corresponding to the ith 
sample. Calculations give 

X = 11.2, - xf = 97.6, = 8.7 

X 

For 10 degrees of freedom Xo^ ~ 1^3; consequently the result is not 
significant. Thus this test gives no reason for questioning the assumption 
that the data came from the same Poisson population. 


REFERENCES 

Investigations have shown that the test must be applied with discretion when the e* 
are small. An interesting example to illustrate the errors that may arise is given in E. J. 
Gumbel, “On the Reliability of the Classical Chi-Square Test,” Annals of Mathematical 
Statistics, 14, 253-263. 

For 2 X 2 contingency tables there is available a correction to called Yates’ correc- 
tion that makes the x^ test slightly more accurate when some of the e,- are small. This 
correction is illustrated in P. Rider, An Introduction to Modern Statistical Methods, John 
Wiley and Sons. 

A proof of the property of the x^ test that permits it to be applied to problems in which 
parameters are replaced by their maximum likelihood estimates is very difficult and 
requires advanced mathematical techniques. It may be found in H. Cramer, Mathe- 
matical Methods of Statistics, Princeton University Press. 



TESTING GOODNieSS OF FIT 


m 


1 . By integration, verify the .05 critical value of given in Table III for v = 2. 

2. Toss a coin 100 times and apply the test to see whether th^ coin is 
unbiased. 

3. In a breeding experiment it was expected that ducks would be hatched in 
the ratio of 1 duck with a white bib to every 3 ducks without bibs. Of |6 ducks 
hatched, 17 had white bibs. Are these data compatible with expectation? 

4. According to Mendelian inheritance, offspring of a certain crossing should 
be colored red, black, or white in the ratios 9 : 3 : 4. If an experiment gave 72, 
35, and 38 offspring in those categories, is the theory substantiated? 

5. The number of individuals possessing the 4 blood types should be in 

the proportions + 2pq\r^ + 2qr\lpr where /> + ^ + r = 1. Given the 
observed frequencies 180, 360, 132, 98, test for cbmpatibility with/? = A^ q = .4, 
and r = ,2. ■ ■■ ■ 

6. According to the Hardy-Weinberg formula, the number of flies resulting 
from certain crossings should be in the proportions q^:2pq:p^, where q A p ^ 

If an experinient gave the frequencies 42, 52, 22, would the results be compatible 
with this formula {a) if q — .5, {b) if q is estimated from the data by using the 

maximum likelihood estimate q = — ^ vvi^ere «i, and «« are the 

^ Til +«2 +«3 

observed frequencies in the three categories ? 

7. Apply the x^ t^st to the normal curve fit for the following 500 determinations 
of the width of a spectral band of light. Here e denotes the fitted normal curve 
frequencies obtained by estimating all the parameters. 


0 

5 

12 43 

61 

105 

103 

89 

54 19 

iHil 

€ 

5 

14 36 

71 

102 

109 

85 

50 21 

1 ■ 2 









8. Given the following data, 











Si.-™,'!' 





X 


2 

3 

4 

5 

6 7 

8 ' 










f 

2 4 

TO 

15 

19 

12 8 7 1 ' 


state how many degrees of freedom you would probably use in the x^ test if you 
attempted to fit the histogram with (a) a normal frequency function, (Jb) Poisson 
frequency function, and (c) binomial frequency function with theory suggesting 
that « = 10, li 

9. Apply the x^ test for goodness of fit to the results of problem 44, Chapter 5, 

10. Apply the x^ test for goodness of fit to the results Of problem 68, Chapter 5, 

1 1 . Apply the x^ test for goodness of fit to the results of problem 69, Chapter 5, 

12. A certain drug is claimed to be effective in curing colds. In an experiment 






260 


INTRODUCTION TO MATHEMATICAL STATISTICS 


on 164 people with colds, half were given the drug and half were given sugar 
pills. The patients’ reactions to the treatment are recorded in the following 
table. Test the hypothesis that the drug is no better than sugar pills for curing 
colds. 

Helped Harmed No Effect 

Drug 52 10 20 

Sugar 44 12 26 


13. In an epidemic of a certain disease 927 children contracted the disease. 
Of these, 408 received no treatment, and, of those, 104 suffered aftereffects. Of 
the remainder who did receive treatment, 166 suffered aftereffects. Test the 
hypothesis that the treatment was not effective and comment about the conclu- 
sion. 

14. Is there any relation between the mentality and weight of criminals as 
judged by the following data? 

Weight 

Mentality 90-120 120-130 130-140 140-150 150- 


Normal 21 51 94 106 124 

Weak 15 18 34 15 15 


15. The following data are for school children in a city in Scotland. Test to 
see whether hair color and eye color are independently distributed. 


Hair 

Eye 

Fair 

Red 

Medium 

Dark 

Black 

Blue 

1368 

170 

1041 

398 

1 

Light 

2577 

474 

2703 

932 

11 

Medium 

1390 

420 

3826 

1842 

33 

Dark 

454 

255 

1848 

2506 

112 


16. Show that for a 2 x 2 contingency table with cell frequencies a, b, c, and 
d, respectively, 

2 (a + 6 + c + d){ad — bcY 
^ “ (a + 6)(c + d)(b + dXa + c) 

17. The number of automobile accidents per week in a certain city were 12, 
8, 20, 2, 14, 10, 15, 6, 9, 4. Assuming that such frequencies follow a Poisson 
distribution, test the homogeneity of these frequencies with the Poisson index of 
dispersion. 

18. Five boxes of different brands of canned salmon containing 24 cans each 
were examined for high-quality specifications. The number of cans below 
specification were, respectively, 4, 10, 6, 2, 8. Can one conclude that the 5 
brands are of comparable quality ? 




TESTING GOODNESS OE EIT 2gl 

19. The following data give the number of colonies of bacteria that cjpveloped 
on 15 different plates from the same dilution. Is one justified in claiming that 
the dilution technique is satisfactory in the sense that the bacteria behave as 
though they were randomly distributed in the dilution ? The number of colonies 
were 193, 168, 161, 153, 183, 152, 171, 156, 159, 140 , fSl, f52, 1 33, !d4‘ isl 

20. Given the following set of frequencies, 10, 2, 5, 4, 13, 11, 7, 12, 8, {a) test 
to see if they may be treated as Poisson frequencies from the same population 
and (6) determine whether the assumption that they are binomial frequencies 
would be more plausible. 

21 . Prove that the estimate used in problem 6 for q is the maximum livelihood 
estimate based on the multinomial distribution. 

22. On the basis of a given hypothesis, indicate why, if an experiment yields a 
value of = Xi slightly less than the critical value for v degrees of freedom and 
if the experiment is repeated with approximately the same results, the two 
experiments combined will yield a degree of confidence in the hypothesis different 
from that given by the first experiment alone. 

23. Show that the method of Chapter 6 for testing the difference of percentages 
is equivalent to the x^ test when applied to the 2 x 2 contingency table of 
successes and failures. It is assumed that p is estimated from the combined 
sample in the difference of percentages method. 

24. Use the table of random numbers to sample from the population given by 

0 1 2 

.4 .4 .2 

Take samples of 25 each and perform 20 (or more) such sampling experiments. 
For each sample of 25, calculate the value of x^ for observed and expected 
frequencies in the 3 cells. Classify the 20 (or more) values of x^ into a frequency 
table. Compare the resulting histogram with the curve for v == 2. A^: a class 
exercise, this is intended to make the x^ theory concerning the use of the contin- 
uous x^ distribution for the discrete variable more plausible. 



CHAPTER 11 


Small Sample Distributions 


Many of the statistical techniques considered in the preceding chapters 
are applicable only when large samples are available. For example, 
the method used in 6.7.1 for testing the hypothesis that two population 
means are equal assumes that the samples are so large that population 
variances may be replaced by their sample estimates without appreciably 
affecting the validity of the test. In this chapter methods are developed 
that do not require the assumption of large samples. Although such 
methods are called small sample methods, they obviously apply to large 
samples as well and might better have been called exact methods. Some 
small sample methods require more information or assumptions than 
the corresponding large sample methods; consequently, small sample 
techniques cannot completely displace the techniques designed for large 
samples. 


11.1 Distribution of a Function of Random Variables 

In developing small sample techniques it is often necessary to find the 
distribution of a function of a single basic random variable or the distri- 
bution of a function of several basic random variables. The technique 
for solving the first of these two problems was developed in 5.4.3. In 
this section methods are presented for solving the second problem. 

Let X and y be two continuous variables with the frequency function 
y) and consider the problem of finding the frequency function of the 
variable y)^ where t is some function of interest. The particular 

functions that are of interest in this chapter are t{x, y) = y x and 
t(x, y) = yjx; however, it is desirable to have available a general method 
of attack for such problems. 

One method of approach is to adapt the change of variable technique 
of 5.4.3 to functions of two variables by holding one of the variables 
fixed. Toward this end, suppose the value of x is fixed so that the relation 

967 



SMALL SAMPLE DISTRIBUTIONS 263 

z == t(x, y) becomes a relation between the random variables f and y 
only. Assume that t{x, y) is an increasing, or decreasing, function of y. 
Then, for x fixed, the relation sj = t{x, y) represents a change of variable 
from y to z to which formula (42), Chapter 5, applies. If g{y\ x) and 
k{z I x) are used to denote the conditional frequency functions oiy and 2 , 
respectively, for x fixed, then by that formula 


( 1 ) 


k(z I x) = 


giy I *) 


dz 

dy 


Next, write /(ir, y) in the factored form 

Similarly, if h(x, z) denotes the joint frequency function of x and z, one 
can write 

h{x, z) =zf (x)k(z \x) 

Taking the ratio of these two joint frequency functions and using (1) will 
then yield the formula 


( 2 ) 




dy 


In this formula it is necessary to replace y by its value in terms of a; and 
2 by means of the relation 2 = t{x,y). 

Formula (2) gives the joint frequency function of a; and 2 in tferms of 
that of a; and y . In order to obtain the frequency function of 2 , it 1$ there- 
fore merely necessary to integrate 7z(ar, z) with respect to x over the entire 
range of x values for 2 fixed. This follows from formula (3), Chapter 8, 
for marginal distributions. 

As an application of this technique, consider the problem of finding 
the frequency function of the ratio 2 = yjx when x and y are independently 
distributed. Since /(a;, 1 /) = /(a;) ^( 2 /) and dzidy = Ijx here, it follows 
directly from formula (2) that 

h(x,z)= \x\f{x)g(y) 

== l»l/0e)5’(«*) 

The frequency function of 2 , say ^( 2 ), is therefore given by 

(3) q(z)=^\x\f{x)g{zx)dx 

where the integration is over the range of x values for 2 fixed. 



264 


INTRODUQTION TO MATHEMATICAL STATISTICS 


As a Special case of (3), let f{x) = > 0, andg( 2 /) = 2 ^ > 0. 

Then (3) yields 



The substitution w x{\ + z) will lead to the result 

qi^) = (1 + 2)■^ 2 > 0 

As a second application of this general technique, consider the problem 
of finding the frequency function of the difference z = y — x. Here 
dzjdy == 1 ; consequently (2) reduces to 

hix, 2) = fix, y) =fix, x + z) 

The frequency function of 2 : is therefore given by 

(4) q{z) = f /(«, x + z)dx 

where the integration is over the range of x values for z fixed. 

The only real difficulty in finding the frequency function of a variable 
z = t{x, y) by^ means of the preceding technique lies in selecting the proper 
limits of integration when integrating the function h{x, z) with respect to 
od. The following problem illustrates the nature of such difficulties. 

Let f{x, y) = %xy^ 0<a:<l, 0 < y < x, and let z — x + y. Then 
dzjdy = 1 and (2) reduces to 

h{x^ z) = f{x, y) = f{x^z — x) ^ %x{z — x) 

Now when 2 is fixed, x can range over only those values that correspond 
to points of the sample space lying on the line whose equation is a? + y = 2 
and whose graph is shown in Fig. 1. The sample space here is the triangle 
bounded by the lines y = x, x := 1, and 2 / = 0. If 2 is fixed at any value 
satisfying 2 < 1, as indicated by line /j in Fig. 1, then the range of possible 
X values is x == zjl to x = z. However, if 2 > 1, as shown by line 4, then 
the range is x = zjl to = 1. As a consequence, the frequency function 
of 2 is given by the two formulas 

k(z) = J Sx(z — x) dx = § 2 ^, 0 < 2 < 1 

2 

= J Sx(z — x) dx ^ —§2^ + 42 — I, 1 < 2<2 
2 

The graph of this frequency function is shown in Fig, 2. 

A somewhat more general problem arises when the joint distribution of 



SMALL SAMPLE mmcimwm: IM 



two functions of the basic variables, s&y u u(x,y) and v = v(x,^, 
is desired. This corresponds to a change from the coordinate system x^y 
to the coordinate system u,v. This is a familiar procedure in calculus, for 
example, when performing a double integration and accomplishing it by 
shifting to polar coordinates. The functions in that case are given by 
r ^/x^ + y^ and 6 = tan“^ y/x. Here the problem would be to |nd out 
how r and 6 are distributed when given the distribution of x and ?/. 

There exists a simple formula for finding the frequency function of such 
transformed variables. It is obtained by applying probability considera- 
tions to an advanced calculus formula for integration and involves the 
Jacobian function. This formula can be extended to any nuntber of 
variables as well. The theory that is developed in this book does not 





266 


INTRODUCTION TO MATHEMATICAL STATISTICS 


require the use of these more general methods; however, a brief discus- 
sion of them is given in the appendix for the benefit of those who are famil- 
iar with advanced calculus methods and wish to become acquainted 
with the general methods. 

The methods that have been explained in this section are now used to 
develop some of the theory of small samples. 


11.2 The Distribution 

One of the most widely used continuous frequency functions in statis- 
tical work is the function that arose in connection with radial error 
problems in Chapter 6 and with the problem of testing goodness of fit 
in Chapter 10. This function has many other applications as well. In 
this section it is used to assist in finding the frequency function of the 
sample variance when random samples are drawn from a normal popula- 
tion. 


Distribution of 

Let X be normally distributed with mean fx and variance c;^ and let x 
and s'^ be the usual sample estimates of these parameters based on a 
random sample of size n. 

Now if the mean ^ were known, one would use the estimate S(a;^ — ^)^jn 
for or^. It would be a simple matter to find the distribution of this esti- 
mate because the quantity 

(5) i = i 2// 

i=i \ a } i=i 

is the sum of squares of n random sample values of a normal variable y 
with zero mean and unit variance and therefore by Theorem 6, Chapter 6, 
possesses a x^ distribution with n degrees of freedom. Then by the change 
of variable technique developed in 5.4.3 one could find the frequency 
function of 

The difficulty in finding the distribution of arises from the presence 
of X in place of fx. In order to make allowance for it is necessary to 
carry out certain manipulations. After these have been made, it can be 
shown by moment generating function methods that the x^ distribution 
is still applicable. 



SMALL SAMPLE MSTRlBtJTtoNS 1 267 

Obvious algebraic operations will show that 

== 'X(Xi - xf = - fi) - (x - fj,)f , , 

= ^ipci - i7f - n{x - [if 

Because of the convenience of working with standard units, this relation- 
ship is divided by and then written in the form i 


ns" 


or symbolically as 




J+ K=L 


If the moment generating function of both sides is taken, 


( 6 ) M^iO) 

Now it can be shown that x and are independently distributed when 
the basic variable x is normally distributed. A proof of this property is 
given in the appendix. This fact is therefore assumed here. Sinee / is a 
function of and Kis a. function of x, it follows from the independence 
of X and that / and K are independently distributed. The independence 
of J and K permits the left side of (6) to be factored ; therefore (6) may be 
written in the form 

Since is the variable of interest here, this relationship is written in the 
form 


( 7 ) 


MAO) 


MdO) 

From the discussion following (5), it follows that L possesses a dis- 
tribution with n degrees of freedom. Now the variable (x — //)V nja is 
a normal variable with zero mean and unit variance ; therefore it constitutes 
a random sample of size 1 from such a variable. The same reasoning 
as before shows that K possesses a distribution with one degree of 
freedom. 

The moment generating function of a x^ variable with v degrees of 
freedom is given by formula (21), Chapter 6, namely 

V . . . 

( 8 ) M^^(6) = (I - 2&)~^ 

Application of this formula to (7) will yield 

(1 - 20 ) 2 


MjiQ) 


(1 


I 

20) 2 


= (1 


20 ) 2 ^” 



268 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Because a frequency function is uniquely determined by its moment 
generating function, this result together with (8) proves the following 
theorem. 

Theorem 1 : If x is normally distributed with variance and s^ is the 
sample variance based on a random sample of size n, then ns^jd^ has a 
distribution with n — 1 degrees of freedom 

Although the name “degrees of freedom” is merely a name given to the 
parameter v in the x^ distribution, it is well chosen because the parameter 
V represents the number of independent variables whose sum of squares 
is a x^ variable. Thus (5) has v = n because the n variables being squared 
and summed are independent, whereas the n variables being squared and 
summed in s^ contain only n — \ independent variables because the sum 
of the variables is 0. 


11.2.2 Additive Nature of 

An interesting and useful property of the x^ distribution is that the sum 
of two or more independent x^ variables possesses a x^ distribution also. 
This property is demonstrated now because it is needed in the next section. 

Let Xi possess independent yf distributions with Vi and 

degrees of freedom, respectively. Consider the variable w = Xi^ + xi- 
From moment generating function properties and (8), it follows that 

MM = 

= (1 - 20)"2\i _ 

= (1 - 

But this is of the same form as (8) ; therefore, the following theorem holds. 

Theorem 2 : If X\ X^ possess independent x^ distributions with 
Vi and V 2 degrees of freedom, respectively, then X\ + X% possess a 
distribution with degrees of freedom. 


11.3 Applications of the x^ Distribution 

In this section Theorems 1 and 2 are used to test hypotheses about, 
and obtain confidence limits for, the variance of a normal variable. 

As a first illustration, consider a problem of testing a hypothetical 
value of O'. If past experience with the quality of a manufactured product 



SMALL SAMPLE DISTOBtraOKr 


269 


has shown that g = 7.5 for the quality variable in question, and if the 
latest sample of size 25 gave a value of s = 10, would there be justification 
for believing that the variability of the quality had increasecj? This 
problem may be treated as a problem of testing the hypothesis 

i7o:(T = 7.5 

against the alternative hypothesis 

}i^\G>T,S 

From Theorem 1, ns^la"^ possesses a distribution with 24 degrees 
of freedom. If the right tail of the distribution is chosen as the critical 
region for testing against it will be found from Table lit that the 

critical value of is given by = 36.4. Since 

E! = == 44 

(l.Sf 

the hypothesis iTo is rejected in favor of f/j, which implies that' there is 
justification for believing that the variability has increased. 

The solution of problem 8, Chapter 9, shows that the right tall of the 

distribution is th0 best critical region for testing Hq against pro- 
vided that the mean of x is 0. If the mean were ji rather than 0, ope would 
use — jjif '> c in place of > c to define the best critical region. 
When the mean is not known, as in the problem just solved, it can be 
shown by methods somewhat more complicated than those used to 
solve problem 8 that ^{x^ — x)^ > c, where c is chosen properly, Refines a 
restricted type of best critical region for the problem being discussed. 
Since > Xo^ is equivalent to — x)^ > c, where c = g^Xo^’ h 

follows that the test employed in solving this problem is a restricted type 
of best test. 

If one were to test the hypothesis Hq:g = Gq against the alternative 
Hi .g <, (Tq, one would use the left tail of the x^ distribution to Obtain a 
best test; however, if the problem were one of testing H^: g = crp against 
Hi '.g Gq^ then methods like those of Chapter 9 will show that there does 
not exist a best critical region in this case. For this last type of alternative 
it is customary to use the two equal tails of the x^ distribution as the 
critical region. 

As a second application of the x^ distribution, consider the problem 
of finding confidence limits for gK Let x be normally distributed with 
variance and let be the sample variance based on a random sample 
of size n. Then 95 per cent confidence limits for may be obtained by 
using the analytical methods explained in 9.2.4 in the following manner. 

From Table III for « — 1 degrees of freedom find two values of x^, 
namely, x^^ and x^^, such that the probability is .975 that x^ > X\ and such 



270 


INTRODUCTION TO MATHEMATICAL STATISTICS 


that the probability is .025 that Then it follows from Theorem 

1 that the probability is .95 that 

2 ^ ^ 5 

Zi < --T < Z2 

or that 


( 9 ) 




These two numbers yield 95 per cent confidence limits for From the 
discussion in the section on confidence intervals it follows that in the 
long run 95 per cent of the inequalities of this type that are computed 
will be true inequalities. This method, of course, is not restricted to 95 
per cent limits. 

As a numerical illustration of the use of formula (9), consider once 
more the data for the first illustration of this section. Since the hypo- 
thetical value of O’ = 7.5 was rejected, one Avould use the sample value 
5' = 10, or the unbiased version of it, as the point estimate of o; however, 
if one were interested in an interval estimate, (9) would be used. Here 
n = 25 and ns^ = 2500. A direct application of (9) and Table III will 
show that 96 per cent confidence limits for o^ are given by 


2500 . ^2 ^ 2500 
40.27 ^ ^11.99 

This inequality is equivalent to 

7.9 < o < 14.4 


It is clear from this result that o cannot be estimated with much precision 
for such a small sample and such variable data. 

As a third illustration, consider the problem of finding confidence 
limits for when several sample variances are available. In particular, 
consider the data given just after (16), Chapter 9, namely, = 237, 
= 320, = 853, = 296, and = 141. Since each of these 

variances is based on a random sample of size 5, Theorem 1 shows that 
the variables (/ = 1, • • • , 5), will possess independent distri- 

butions with four degrees of freedom each. By Theorem 2 their sum, 
will therefore possess a distribution with 20 degrees of 
freedom. Since = 9235, formula (9) and Table III will then yield 
the following 96 per cent confidence limits for : 


9235 

35.02 


< 


9235 

9.237 


or 


264 < (t2 < 1000 



SMALL SAMPLE ^ . ^ 271 

For data of the type just considered, the technique of combining several 
sample variances to obtain an estimate of has certain advantages over 
the customary method of combining all the data to obtain a single direct 
estimate of cr^. In the problem considered it may be that the variability 
of the product is unchanged from day to day but that the rnean has 
changed. If all the data were combined, the change in the mean would 
tend to increase the value of over what it would be if the mean were 
stable from day to day. The sum of the daily values of however, would 
not be affected by such changes in the mean. Thus, by using the sums of 
daily variances, one may be able to obtain a valid estimate of ' ^ even 
though the product is not strictly under control. Here is understood 
to be the population variance of the product when shifts in the mean do 
not occur. 

11.4 Student’s ^ Bistrihutibh^^ 

Consider the data of Table 1 on the additional hours of sleep gained by 

. ■ ■ ■ : ii' 

Table 1 !; 


Patient 1 2 3 4 5 6 7 8 9-10 



10 patients in an experiment with a certain drug. The problem is to deter- 
mine whether these data justify the claim that the drug does 'produce 
additional sleep. 

Assume that these patients may be treated as a random sample of 
size 10 from a population of such patients. Furthermore, assume that 
the number of additional hours of sleep that a patient obtains itrom the 
use of this drug is a normally distributed variable. The problem may then 
be treated as a problem of testing the hypothesis :: 

== 0 

against the alternative ; 

If this problem were treated in the traditional large-sample manner of 
Chapter 6, the experimenter would use the data of Table 1 to obtain 


a; = 1 .24 and 


.9 = 1 .4S 



272 INTRODUCTION TO MATHEMATICAL STATISTICS 


Chen he would calculate 


T 


X — IX 
<^x 


ar — 0 
a 




and approximate its value by replacing cr by ^ to obtain 


1.24710 

1.45 


2.70 


From Table II, the probability of obtaining a value of r > 2.70 is .0035; 
consequently the hypothesis that = 0 would be rejected here in favor 
of the alternative that > 0. The drug undoubtedly has a beneficial 
effect with respect to sleep, even though it may be due to psychological 
factors affecting the patient. 

This method of solving the problem is subject to one serious objection. 
For a sample as small as this, the sample standard deviation, s, will not 
be an accurate estimate of a; consequently a serious error may be intro- 
duced in the value of r in replacing a by its sample estimate. In most 
applied problems the true standard deviation is unknown. In order to 
overcome this defect in the test, it is necessary to replace the random 
variable r by a new random variable which involves the sample standard 
deviation rather than the population standard deviation. Such considera- 
tions will lead to what is known as Student's t distribution. 

Although the t distribution is being introduced here to solve a particular 
problem, it has many other important applications. In its most general 
form a Student t variable is a variable of the type 


( 10 ) 



where u is a standard normal variable and is a variable with v degrees 
of freedom distributed independently of u. 

The frequency function of t can be obtained by finding the frequency 
functions of the numerator and denominator of r and then applying 
formula (3). 


The numerator variable uVv, which is denoted by y, is a normal 
variable with mean zero and variance v because w is a standard normal 
variable; consequently the frequency function of y, which is denoted by 
k{y), is given by 


( 11 ) 




^ 


The denominator variable v is the square root of a x^ variable; there- 
fore its distribution can be found by using the change of variable technique 



SMALL SAlVrPLE DISTRIBUTIONS " 273 


that was explained in 5.4.3. Toward this end, let the variables;* x and 7/ 
of that section be set equal to x = and y = v. Then the required 
change of variable is given by the relationship y = "\/ x. Application of 
formula (42), Chapter 5, then yields 

g{v) = /(y2)2i? 

But is 2 i x^ variable with v degrees of freedom whose frequency function 
is given by (20), Chapter 6; consequently 

( 12 ) g{v) = e ^ • 2v 

V — 1 — 

= lav e ^ 

Here a is the distribution constant l/2*'^^r(r/2). 

In order to apply formula (3) to (10), it is necessary to associate the 
variable v with x and the variable wVr with y. The function /(;r) of (3) 
is therefore given by replacing v by x in (12). The function ^(^) of (3) is 
given by k{y) in (11). Finally, it is necesisary to associate the variable t 
with the variable 2 . After these substitutions in notation have be^n made, 
formula (3) when applied to (10) will yield 

... 

■ 7*00 g "2t- 

qit) = X- 2ax^~^e ^ • — = dx 

Jo Vlwv 




^Ittv 


x"e 


dx 


Now let w = x\\ + t^lv)l2; then dx = dwjVlwV 1 + t^/v and 


?(0 


2% 


\/ 71 








’ dw 


From the derivation in 6.9 it will be observed that this last iittegral is 
equal to FKt' + l)/2]; consequently : 


9(0 


<■- 1 ) 


-2(^ + 1) 


where c is the constant 



The preceding derivation proves the following theorem. 



274 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Theorem 3 : If u is normally distributed with zero mean and unit variance 
and has a distribution with v degrees of freedom, and u and v are 
independently distributed, then the variable 

wV V 

V 

has a Studenfs t distribution with v degrees of freedom given by 



where c is the constant given in (13). 

Now consider once more the problem that was introduced at the 
beginning of this section in order to see how this theorem can remedy 
the defect in the large sample method of solution. Since x is normally 
distributed with 0 mean, the variable 


X 

w = — 



a 


possesses the properties of u in Theorem 3. From Theorem 1 it follows 
that 


possesses the properties of v^ in Theorem 3 with r = w — 1. Since it is 
known that x and s^ are independently distributed, Theorem 3 may be 
applied to give 


^ s 1.45 


2.57, V 


9 


From Table IV it will be found that the probability is approximately 
.017 of obtaining a value of t > 2.57. This result is therefore significant 
at the 5 per cent significance level. 

A comparison of the probability of P = .017 with that of P = .0035 
obtained by the use of large sample methods shows that the large sample 
method is not accurate for a sample as small as 10. It will be found that 
the large sample method gives probabilities that are consistently too 
small; consequently large sample methods will claim significant results 
more often than is justified. The explanation for this bias on the part of 
large sample methods is that the t distribution has a slightly larger dis- 
persion than the standard normal distribution. The situation is shown 



SMALL SAMPLE DlSTKmWrOM 275 



Fig, 3. Standard normal and Student’s r distributions. 


graphically in Fig. 3, which gives the graphs of the standard normal dis- 
tribution and Student’s r distribution for four degrees of freedom. 

The important feature of the t distribution is that it does not depend 
on any unknown population parameters, hence there is no necessity for 
replacing parameter values by questionable sample estimates as there is in 
the large sample normal curve method. 


11.5 Applications of the t Distribution 

■■ . jj,- • • - . 

11,5.1 Confldence Limits for a Mean 

Let rr be normally distributed with mean fx and variance Let x 
and be their sample estimates based on a random sample of size n. 
Then, as before, 

:x — a 

u 

alyjn 

and 


satisfy the requirements of u and v in Theorem 3; consequently,^ 


(14) 


(x - i[/)y/n — 1 
s 


possesses a t distribution with « — 1 degrees of freedom. If 1 05 represents 

the value of 7 such that the probability is . 0 $ that 1/1 > /, 05 , then the 

probability is .95 that ^ , 

(x - - 1 


^ 05 



276 

or that 
(15) 


INTRODUCTION TO MATHEMATICAL STATISTICS 


^ - ^.05 -7== < iM < a: + f .05 --=== 

- 1 >Jn - 1 

This inequality determines a 95 per cent confidence interval for //. Since 
the probabilities headin the columns of Table IV are for one tail only, 
it is necessary to look in the column headed .025 in order to find the value 
of ? 05 needed in (15). If some probability other than .95 is desired, it is 
merely necessary to replace t Qg by the corresponding value of t from 
Table IV, once more looking in the column headed with half the proba- 
bility attached to t. The entries in the last row of Table IV, which are 
those for a standard normal variable, enable one to observe how rapidly 
Student’s t distribution approaches that of a standard normal variable 
as the sample size increases. They also enable one to select the correct 
column in looking up critical values of t because of familiarity with large 
sample normal curve critical values such as 1.64 and L96. 


11.5.2 Difference of Two Means 

The t distribution may be used to eliminate the error in large sample 
methods when testing the difference of two means in the same manner 
as for testing one mean. Let x and y be normally distributed with means 
fXy. and y.y and with the same variance g^. Let random samples of sizes 
and riy be taken from these two populations. Denote the sample means 
- and variances by x, y, and Sy^, Then 

y ^ (^ - y) - - fly) 


(x- y)- 



will possess the required properties of u in Theorem 3. Furthermore 

^2 ^ + riyS, 

with V n^ + Hy 2 degrees of freedom, is easily seen to possess the 
properties of in Theorem 3. This follows from Theorems 1 and 2 
because 



SMALL SAMPLE DlSXlUBUTIDKfS 111 

possess independent distributions with — 1 and — 1 degrees of 
freedom, respectively. Consequently 

(16) t =: l^y) + fly — 2) ^ 

+ riySy^ ^ n^ + riy 

V = n^+ riy- 2 •! 

will have Student’s / distribution with + riy 2 degrees of freedom. 
Then, to test the hypothesis that is merely necessary to calculate 

the value of t and use Table IV to see whether the sample value of t 
numerically exceeds the critical value. 

It will be noted that the value of t does not depend on any population 
parameters as in the large sample method explained in 6.7.1. It will also 
be noted, however, that the t test is less general than the large sample 
method because here it is necessary to assume equality of the variances, 
which was not true for the large sample approach. 

Formula (16) may also be used to determine confidence lim^ 

Px ““ If il has been shown that the hypothesis is not a reason- 

able one, it may be of interest to know how large or how small a diiference 
is reasonable. For a given probability, confidence limits for 
will give the desired answer. 

As a numerical illustration, consider the data of Table 2 on the yield 
of corn in bushels per plot on 20 experimental plots of ground, half of 
which were treated with phosphorus as a fertilizer. 


Table 2 


Treated 


ora 

6 

6.3 5.8 

5.7 

6 

6’ 18 

Untreated 

|||E||9|||| 

5.9 5.6 

5.7 

ran 



sh 5.5 


The problem is to decide whether the addition of phosphorus will 
improve the yield of corn. It may be treated as a problem of testing the 
hypothesis 

~ P'y •: 

against the alternative 




where x and y denote the yield on a treated and untreated plot, respec- 
tively. It will be assumed that all the plots were treated alike, except for 
the addition of phosphorus to half of them selected at random, and that 
the yield of corn on a plot may be treated as a normal variable. ; It will 






278 


INTRODUCTION TO MATHEMATICAL STATISTICS 


also be assumed that — Oy. These assumptions are sufficient to permit 
formula (16) to be applied to this problem. Calculations here give 

x = 6, = .64 

y — 5.7, riySy^ = .24 

When (16) is applied, 

.= 18 

V.64 + .24 20 

From Table IV the .005 critical value of / is / = 2.878, using only the 
right tail because of Hii consequently, this result is certainly significant, 
and the hypothesis of no increase in mean yield will be discarded. 

If the assumptions of normality and equality of variances are reason- 
able so that the experimenter can justifiably claim that this significant 
difference is caused by a real difference in the population means, he will 
undoubtedly want confidence limits for — jjiy. The same calculations 
as before give 

.0989 


Then, 95 per cent confidence limits are given by 


which reduces to 


- fly) 

.0989 


< 2.101 


.092 < — /Wj, < .508 


From this result it is clear that for a sample as small as 10 one cannot 
promise with any great degree of certainty more than about .092 unit 
increase in yield, which is only about a 2 per cent increase in the mean 
yield of ^ = 5.7 because of the addition of this amount of phosphorus. 

The preceding methods are valid only under the assumption that 
(Tg. = Oy, If ^ Oy but the values of and Oy are known, one can test 
the hypothesis — fiy by means of the standard normal variable 


_ y) ^ 

(Tx-y 


{x- y)- - fly) 



The values of the two variances are seldom known; therefore it is usually 



SMALL SAMPLE rnffimOTldM'" ^ 

necessary to replace them by their sample estimates, just as was ijone for 
the large-sample method in 6.7. The difficulty here is that only small 
samples are assumed to be available. 

If and 0 */ are replaced by their unbiased sample estimates, 

% (Xi- xf 2 iVi - yf 

^2 = ^.=1 and 8J' == 


the resulting variable 
(17) 


t = 


{x - y) - (/fa - fi^) 


^ n 


2 -2 
-k- 


can be shown to possess an approximate Student t distribution. ■ This is 
not surprising in view of the fact that Student’s / is obtained by replacing 
the unknown variance by its unbiased sample estimate in the correspond- 
ing expression for a single variable. The number of degrees of freedom 
necessary to make (17) an approximate / variable is given by ^ rather 
elaborate formula, namely, 


y = 


+ 1 

. . . ...■ ■■■ 

Although V is not likely to be an integer, it usually suffices to chopse the 

nearest integer value in looking up critical values of /. 

The foregoing problem is known as the Behrens-Fisher problem. 
There has been much controversy over how it should be solved, and the 
approximate solution here is but one version. :! 



11,5.3 Conflaence Limits for a Regression Coefficient 

The problem to be considered in this section is that of deterjnining 
whether the difference between the slopes of a sample and a thecjretical 
regression line might reasonably be caused by sampling variatioh. Let 
X and y denote the two variables, and let and Yi(i = 1, 2, ; • • , «) 
denote their sample values for a random sample of size n. The corre- 
sponding small letter is used to represent the variable measured frpm its 



280 


INTRODUCTION TO MATHEMATICAL STATISTICS 


mean. With this notation, the equation of the least-squares, or maximum 
likelihood, regression line as given by (7), Chapter 7, is y' = hx, where 

n 

1 

n 

1 

The assumptions made in 8.4 are made here also. They consist in 
assuming that repeated samples of size n are selected in such a manner 
that the same set of X values as the original set is obtained each time and 
that the are independently normally distributed about a true regression 
line whose equation may be written in the form 

7' = a + /?^ 

with the same variance or^, for all 7^. Since the same set of 7’s, hence the 
same set of a;’s, is obtained in each sample of «, the x"s may be treated 
as constants with respect to the sampling. The value of corresponding 
to Xi, however, varies with each sample of n in the manner just described. 
Although the X"s and F’s were assumed to be chosen at random, the X’s 
need not be so chosen. In practice, one usually chooses them in advance 
to cover adequately the range of X of interest and then selects the 7’s 
corresponding to these values of 7 in a randoni manner. This is discussed 
more fully in 8.4. 

For simplicity of notation let 

(18) w, = ^ 

2 

Then 

i=i 

Since the may be treated as constants with respect to the sampling, 
the Wi may also be so treated; hence b may be treated as a randpm 
variable that is a linear function of the random variables Y^, Fgj * ‘ 

Now the solution of problem 38 of Chapter 6 shows that a linear combina- 
tion of independent normal variables is also a normal variable; hence b 
is a normal variable. Since the mean and variance of 6 will be needed, 
consider their evaluations next. 

Using expected values, 

E[b\ = E[Jw,Y,] = yyv,E{Yi] 



SMALL SAMPLE DIStRlBOTTOM t 281 


But from the assumption that the means of the lie on the true regres- 
sion line, 

= i: 


Hence 


Since Sas^ = 0 because = X^ — X, it follows from (18) that 


This shows that the mean value of the slope of the sample regression line 
is equal to the slope of the population regression line, or in the language 
of Chapter 9, that b is an unbiased esimator of i 

Since the Tj are statistically independent and have the same variances, 
it follows from formula (21), Chapter 9, that 

Substituting (18), 


2 _ 


O' 


From the preceding results, it follows that the variable 

^ = C = JL 

a 

possesses the properties of the variable u in Theorem 3. In ord^r to be 
able to apply Theorem 3 to this problem, it is necessary to find an inde- 
pendent variable to serve as In the preceding applications of this 
theorem such a variable was obtained by recognizing that ns^ja^ possesses 
Q. distribution. Since cr^ for this problem is the variance of the devia- 
tions of the Yi from the true regression line, the quantity to use in place 
of ns^, the sample estimate of «cr^, is — 7/)^. With this chotee, one 
would expect the variable ii 



i;2 = 


I (Y, - T/)' 


to possess a distribution. It can be shown with considerable difficulty 
that a® does possess a distribution, but with n — 2, not n — 1, degrees 
of freedom, and that u and are independently distributed. These facts 
are assumed here. A direct application of Theorem 3 to the preceding 
M and V variables will then show that 




' (n - 2)I(Jir, - Xf 
1(Y,-Y:f 



( 19 ) 



282 


INTRODUCTION TO MATHEMATICAL STATISTICS 


possesses a Student’s t distribution with n — 2 degrees of freedom. By 
means of (19) one can test hypothetical values of regression slopes and 
find confidence limits for them. 

As an illustration of how (19) is applied, consider the data of Table 3 
on the relationship between the thickness of coatings of galvanized zinc 
as measured by a standard stripping method Y and a magnetic method X. 

If the magnetic method were reliable for measuring the thickness of 
such coatings, it would be preferred to the standard stripping method 
because it does not destroy the sample being measured and the standard 


Table 3 



116 

132 

104 

139 

114 

129 

720 

174 

312 

338 

465 

X 

105 

120 

85 

121 

115 

127 

630 

155 

250 

310 

443 


method does. Now suppose that the magnetic method yields the same 
mean thickness as the standard method for thicknesses in the normal 
range. Then the true regression line of 7 on Z will be the line Y = X. 
Thus, under this assumption of the consistency of the two methods, 
/S = 1. If, contrary to the preceding supposition, the magnetic method 
were biased in giving, say, too small a reading for thin coatings, then the 
true regression line, provided that the regression curve is a straight line, 
would have a slope greater than 1. 

In view of the preceding discussion, consider the problem of testing 
the consistency of the two methods for measuring the thicknesses of 
coatings. The problem may be treated as a problem of testing the hy- 
pothesis 

against the alternative hypothesis 

If it is assumed that the necessary conditions for applying (19) are satis- 
fied, then the data of Table 3 may be used to yield the information needed 
in (19). It will be found that the equation of the least squares line fitted 
to the data is 

r = 1.12Z- 1.79 

It will also be found that 


and 


2(Z, - Xf = 301,826 
TiY, ^ Y^f = 2766 



SMALL SAMPLE DISTRIBUTIONS “ ' f " 283 

If these values are used in (19), then 

/?(30U2Q_3„ 

2766 

From Table IV in Appendix 2 the 5 per cent critical value of / is 2.26; 
consequently this value is significant. It appears that there is a slight bias 
in the magnetic method of the type suggested earlier. 

In the preceding problem interest was centered exclusively on tihe con- 
sistency of the two methods. No attempt was made to consider the 
precision of the magnetic method as a substitute for the standard liiethod. 
This problem can be solved by studying the variance of the etrors of 
estimation. If the magnetic method were sufficiently precise to justify 
its use, then the preceding discussion and test would suggest that a larger 
sample be taken to obtain an accurate estimate of /5 so that the bias 
could be estimated accurately and a correction made for it. 

The preceding method for finding confidence limits for the slope of a 
regression line can be generalized to find confidence limits for thq: regres- 
sion coefficients in multiple and curvilinear regression. It can also be 
adapted to finding confidence limits for the ordinate of a regression 
curve corresponding to any fixed Value of x. All of these problems give 
rise to the / distribution. References for these applications are given at 
the end of the chapter. 

Thus far, Student’s t distribution has been justified only on th^ |fourids 
that it eliminates an inaccuracy of certain large sample methods. It is 
conceivable that there are other tests which overcome this inaccuracy 
and which at the same time are better tests than the t test in the sc^^se of 
Chapter 9. It can be shown, however, that the tests using the / distribu- 
tion that have been considered possess optimum properties from this 
point of view. 

11.6 The jP Distribution ; 

It will be recalled that it was necessary to assume that = Gy in order 
to apply the t distribution to testing the difference between two means. 
In order to check on this assumption, it is necessary to derive a frequency 
function that can be used for testing the equality of two variances." It will 
be found that such a frequency function has many other uses as well. 

Let u and v possess independent distributions with and degrees 
of freedom, respectively. Then consider the problem of finding the 
frequency function of the variable 

(20) F = ^ 

vlv. 



284 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Formula (3) can be used to solve this problem in much the same manner 
as it was used to find the frequency function of Student’s t variable. Since 
u possesses a distribution with v-^ degrees of freedom, the distribution 
of the numerator variable ujv^ in (20) can be found by using the change 
of variable technique given in (42), Chapter 5. For this purpose let 
X u and y = ujv-^; then the change of variable is given hy y = xjv^, 
and formula (42), Chapter 5, gives 


g(2/) =/(*)vi 


r(l‘) 


n_i 


1 

^ ay ‘‘ e 

where the constant a is given by 

r(^) 

The denominator variable vlv 2 in (20) will possess a corresponding fre- 
quency function with a constant b that is obtained from the constant a 
by replacing by Vg. 

Formula (3) may now be applied, provided is replaced by F, to give 


J *co 

, 


S-1 /•«> i(, 




a{xFf 'e 


-2) -5(V2+nP) 

e ~ ax 


Let w = x(v 2 + ViF)l2; then dx = 2dwl{v^ + VjF) and 




obF'^ 22 


r® 1 

W2 

^+*’2) Jo 


e- dw 


(-. + W 

It will be observed that the value of this integral is r[(vi + r2)/2]; con- 
sequently q{F) reduces to 


q{F) = 


(va + v^Ff 


where 


r(i)r(s) 


This derivation proves the following theorem. 



SMALL SAMPLE t>ISTRlStJ'frdM‘''' 

Theorem 4: If u and v possess independent distributions wiffi Vi and 
V 2 degrees of freedom^ respectively, then > 

■ ■ . %' - 

F = ^ 
vivi 

has the F distribution with and degrees of freedom given by ’ 


/(F) = cF^ (va + ViF) 2 


where c is given by (21). 


11,7 Applications of the jF Distribution n 

Since the F distribution was derived partly in order to justify the 
assumption of the equality of variances which is needed in the t test 
when that test is applied to testing the difference between two ; means, 
consider the problem of testing the hypothesis 

Hq UJx - 

against the alternative ' 

under the assumption that x and y are normally distributed. 

Let sj^ and Sy^ be sample variances based on random samples of 
sizes and respectively, from these two populations. Theh, since 
and nySy^jOy^ possess independent x^ distributions, 

H_ = , - ■ 

’'l («*- 

and ^ ^ ^ 

V n,.s,^ 

V2 (n^ - iW j; 

will satisfy the requirements for u/vi and u/vg in Theorem 4. Under 
the hypothesis //q, cTo? = <7^; therefore, by Theorem 4, 

■ ti' ■: 

J- - 1) 

— 1) 



286 INTRODUCTION TO MATHEMATICAL STATISTICS 

will possess the distribution with — 1 and — 1 degrees of freedom. 
Here and aj^ denote the unbiased estimates of <yj^ and This nota- 
tion is introduced to point out the fact that the value of F to use in testing 
gJ^ = Gy^ is the ratio of the unbiased estimates of the two variances. This 
test, like the t test, possesses the desirable feature of being independent of 
population parameters. 

As a numerical illustration, consider the problem that illustrated the 
application of the t distribution to the testing of the difference between 
two normal means. From Table 2 and immediately following it, 


and 


a} = == .071 

«x — 1 

= .027 

n„ — 1 


Therefore F = 2.63 with = 9 degrees of freedom. It is necessary 

to consult tables of critical values of the F distribution in order to decide 
whether this value of F is unreasonably large or small. Such values are 
to be found in Table V in Appendix 2. 

Since the F distribution depends on the two parameters and v^, a 
three-way table would be needed to tabulate the values of F corresponding 
to different probabilities and values of and As a consequence, only 
the 5 and 1 per cent right-tail area points are tabulated corresponding 
*to various values of and v^. The technique in the use of Table V is 
explained by means of the graph in Fig. 4, which illustrates the graph of 
/(F) for a typical pair of values of and rg. Let F^ denote the value of 
F for which P[F < F^} = .025. and Fg the value for which P{F > Fg} = 
.025. If the sample value of F falls outside the interval (F^, Fg), the 
hypothesis of a common g^ will be rejected. For convenience of notation. 



Fig. 4. A typical F distribution. 



SMALL SAMPLE MSfKlBOTlONS^ ‘ • 187 

let F' = 1 jF. Since F = with and degrees of freedom, F' — 

d^la^ with Vg and degrees of freedom. By means of the reciprocal 
function F', the probability of can be evaluated as follovifs: 

.025 = F{F < Fi} = pji > Ij = p[f' > 1 

This result shows that the left critical value of the F distribution corre- 
sponds to the right critical value of the F' distribution. As a result, it is 
necessary to find only right critical values for F and F' io determine 
Fg and F^. The reciprocal of the right critical value for F' gives the left 
critical value for F. Because of this property of F, only right critical 
points for Fare tabulated. Unfortunately, only the 5 and 1 per cent criti- 
cal points have been tabulated in Table V; consequently, it is necessary 
to interpolate between these two values in order to obtain an approximate 
2\ per cent critical point. 

In view of this reciprocal property, the procedure to be followed is 
always to place the larger of the two unbiased variance estimates in the 
numerator of F; consequently, will always denote the larger of the 
two estimates. If the hypothesis of a common is rejected vvhenever 
the sample value of this F exceeds its 2| per cent point, the hypothesis 
will be rejected whenever the original F falls outside the interval (Fi, Fg), 
for, when F > 1, Fg will serve as the critical value, and, when F< 1, 
F' will be used instead and Fg' will serve as the critical value. fBut, as 
demonstrated in the preceding paragraph, Fg' for F' corresponds to Fj 
for F 

If this procedure is applied to the numerical problem being discussed, 
it will be found from Table V that the 5 per cent critical value is, by inter- 
polation, 

F = 4.5, Vg = 9 

^ .u 

The sample value of F = 2.63 is therefore not significant. This result 
implies that the assumption of equal variances is a reasonable bne and 
that the significant value of t obtained in connection with this problem 
when testing the hypothesis == /Xy may not be reasonably attributed to 
a lack of the assumption Gy being satisfied. This check on the 
reasonableness of the assumption that = Gy is usually carried out 
whenever the t test is used to test the difference between two means. It 
does not follow, however, that if the hypothesis g^ == Gy is not substanti- 
ated a significant value of t will be due to a lack of this assumption- s being 
satisfied. 

The preceding test is not a best test in the sense of Chapter 9, because 



288 


INTRODUCTION TO MATHEMATICAL STATISTICS 


it can be shown that there does not exist a best test for this problem; 
however, it is known that this is a good test from the type II error point 
of view. 

Further applications of the F distribution are made in Chapter 12 on 
what is known as analysis of variance techniques. Because of the import- 
ance of such techniques in designing experiments, they have been incor- 
porated in a separate chapter. 


11.8 Distribution of the Range 

In certain fields of applied statistics the amount of routine computa- 
tion becomes burdensome unless methods are chosen that involve only a 
small amount of it. In industrial quality-control work, for example, the 
repeated computation of standard deviations as measures of the variability 
of a product is undesirable. It is customary in such work to take the range 
as the measure of variability. Not only is the range easy to compute, but 
it is also simple to explain as a measure of variation to individuals without 
a statistical background. For small samples from a normal population, 
it can be shown that the range is nearly as efficient for estimating a as is 
the sample standard deviation; consequently for small samples the range 
is a highly useful statistic. 

Consider a random sample, • • • , drawn from the popula- 

tion whose frequency function is f{x), which is assumed to be continuous. 
Let these sample values be arranged in order of increasing magnitude and 
denote the ordered set by x^, ‘ ‘ Now, consider the problem 

of finding the probability that the smallest value x^ and the largest value 

will fall within specified intervals. The frequency function of the range 
can be found quite easily by means of this probability. 

Let the x axis be divided into the .five intervals (—oo, w), (w, u + Aw), 
(m + Aw, v), (y, V + At;), {v 4- At;, oo), where w < t; are any two values 
of X, The probability that x will fall in any particular one of these inter- 
vals is given by the integral of f{x) over that interval; hence the prob- 
abilities corresponding to these five intervals can be written down even 
though they cannot be evaluated unless the form of f(x) is known. In 
this connection, let 

+ + 

(22) /»2 = f(z) dx, ps = /(«) dx, p 4 = fix) dx 

and determine the probability that in a sample of n values of x one will 
obtain no value in the first interval, 1 value in the second interval, « — 2 
values in the third interval, 1 value in the fourth interval, and no value 



SMALL SAMPLE DISTRIBtJTldKfS J: 289 

in the fifth interval. This procedure is equivalent to finding the prob- 
ability that the smallest value in the sample will fall between u and u + Aw, 
whereas the largest value will fall between v and v + Lv. The desired 
probability can be obtained directly from the multinomial distribution 
given by (39), Chapter 5, by treating a; as a discrete variable which can 
assume only one of five possible values corresponding to the five intervals. 
If and /?5 denote the probabilities that x will fall in the first and fifth 
intervals, respectively, the desired probability is given by 


which reduces to 


n! 


0! l!(n -2)! 1!0! 




Vps" 


(23) n{n - ■ 

Expression (23) can be simplified somewhat by simplifying the iiategrals 
of (22). Since f{x) is assumed to be a continuous function, the mean value 
theorem for integrals may be applied here. This theorem states that if 
f{x) is continuous on the interval (a, /3), then 


f{x) dx = (^^a)/(^) 


where X is some number in the interval (a, /3). A direct application of 
this theorem to (22) shows that 


and 


/?2 = Aw -/(w + 01 Aw), 0 < 01 1 

■■ i; 

Pi = /S,v -/(v + 62 Av), 0 ^ 02 ^ 1 

ir 

The first of these two results when applied to yields 

J *V l*V l*U + ^U 

f{x) dx = /(*) dx — f(x) dx 

= I /(^) ““ Aw/(w + 01 Aw) 

■ Ju ;; 

If these values for p 2 , p^, and p^ are inserted in (23), it becomes 
(24) n(n — !)/(« + Au)f(v +02^1^) 

> nw-2 

f(x) dx — i\uf{u + 01 Aw) Aw Ay 


X 



290 


INTRODUCTION TO MATHEMATICAL STATISTICS 



Fig. 5. Sample space for smallest and largest values. 


This expression is the probability that the smallest value of the sample 
will lie between u and u + Au and at the same time that the largest 
value of the sample x^ will be between v and v + Av, Geometrically, 
this expression gives the probability that the point xj will lie inside 
the rectangle sketched in Fig. 5. In order to find the probability density 
of the two variables x^^ and x^ at the point (w, v), it is necessary to divide 
the preceding probability by the area of the rectangle, namely AuAv, 
and take the limit of the resulting quotient as Au and Av approach 0. If 
this probability density is denoted by f(u, v), it follows from (24) that 


(25) 


f{u, v) = n{n - \)f(u)f(v) 


I 


f{x) dx 


n-2 


*Since /( m , v) is the probability density of the variables x^ and x^ at the 
arbitrary point (w, v), (25) gives the desired joint frequency function of 
the smallest and largest values of a sample of size n. These results may be 
stated in the following theorem. 

Theorem 5 : 7/' u and v denote the smallest and largest values^ respectively^ 
in a random sample of size n from the population with the continuous fre- 
quency function f(x), then the joint distribution of u and v is given by 


f(u, v) = n(n - 


mu)f{v)\ f' 

— V tl. 


f{x) dx 


w-2 


The frequency function for the range can be obtained very easily from 
this result by means of formula (4). For this purpose it is necessary to 
let = t;, a; = w, and z = R. Then 

q(R) = j" f(u, u + R)du 


SMALL SAMPLE mSTKIBUTIdNS t 291 

where the range of integration is over possible values of « when i? is fixed. 
If the variable cc ranges over the interval (a, b\ the range of li witli R 
fixed will be from atob — R. This upper limit arises from the fact that 
the smallest measurement, u, must be R units smaller than the largest 
measurement, v, and v cannot exceed the upper limit b for x. An expres- 
sion for ^(jR) may now be obtained by inserting the value of /(w, t;) given 
in Theorem 5 and using the limits of integration that were just found. 
The results of these operations are expressed in the form of a theorem. 

Theorem 6: If the continuous variable x has the frequency function f(x) 
and if x assumes values in the interval (a, b) only, then thefreqyiencypmction 
of the range q{R) for a random sample of size n is given by the formula 


q(R) 


J *b-R 

J 

a 


f{u)f(u + R) 


J ^ 


fix) dx 


w-2 /; 


du 


Unless the integral of f (x) is quite simple, this expression is likely to be 
difficult to work with, even numerically. As an illustration of a simple 
problem, consider the range for a sample of size n from the rectangular 
distribution that is defined for 0 < a: 1 hy /(^) = 1- Here 

. . .. . 

Cu + R ^u-^R 

f(x)dx=:\ dx ^ R 

Therefore, by Theorem 6, 

q{R) = n(n — 1) du 

■ -Jo . . ii • ■ 

= n(n - - i?) ;; 


11.9 Applications of the Range 

In the introduction to the last section it was remarked that the range 
was useful as a substitute for the standard deviation as a meajjsufe of 
variability in certain routine operations. It should therefore be of interest 
to know what the relationship is between the range and the sjlandard 
deviation for, say, a normal distribution. This relationship may be found 
by calculating the mean of R, Since 



Rq(R) dR 



292 INTRODUCTION TO MATHEMATICAL STATISTICS 

it is clear from Theorem 6 that the evaluation of the desired relationship 
will give rise to a complicated double integral. Unfortunately, when 
fix) is a normal frequency function, these integrations cannot be per- 
formed directly for general n ; therefore numerical methods of integration 
are required. In spite of the complicated nature of the integral defining 
EiR), it can be shown that EiR) is a constant, depending on times a. 
Tables are available for the normal variable case that expresses = 
in terms of for various values of n. Table 4 gives a few entries from 
a table to indicate the nature of the relationship. 

Table 4 




10 

50 

100 

II 

1.128 

1.693 2.059 2.326 3.078 

4.498 

5.015 


As an illustration of the use of such tables, consider once more the tech- 
nique of constructing a quality-control chart for x as given in 6.6.1 . There 
a 3a^ band was constructed for controlling x. If the range is taken 
as the measure of variability, 3(T^ = 3a jV n will be replaced by 3fjij^ld^V n, 
where is the value obtained from the table, that is, the value of the ratio 
corresponding to the given value of «. Now the value of can be 
estimated by using the sample mean of the R values obtained for the 
various samples of n each. For such charts n is usually chosen to be an 
integer near 4 and a fairly large number of samples of this size is obtained 
before the chart is drawn; consequently, is usually estimated quite 
accurately. 

If n is chosen less than 10, the estimation of a^ by means of the range 
rather than the standard deviation of a sample is quite efficient. Investi- 
gations have shown, for example, that the variance of the estimate of a^ 
based on the range of a sample of size 6 is only about 1 5 per cent larger than 
the variance of the sample standard deviation for a sample of size 6. From 
the point of view of Chapter 9, one can therefore conclude that the range 
is nearly as good as the standard deviation as an estimator for for small 
samples. 

REFERENCES 

A proof of the distribution of which is needed to justify the use of the 

t distribution on regression coefficients may be found in S. S. Wilks, Mathematical 
Statistics, Princeton University Press, pp. 157-159. 

The tables for ranges from which Table 4 was extracted may be found in L. H. C, 



SMALL SAMPLE DlStltlBUtlONS ■■ 293 

Tippett, “On the Extreme Individuals and the Range of Samples Taken from 4 Normal 
Population,” Biometrika, 11, 364-387. 

The application of the range to quality control charts may be found in E. it. Grant, 
Statistical Quality Control^ McGraw-Hill Book Co. 

The application of the t distribution to the problem of finding confidence 'limits for 
multiple regression coefficients and related problems may be found in H. Cram^f, 
Mathematical Methods of Statistics, Princeton University Press, pp. 551-554/^^^ 


EXERCISES 

if ■ ■ ■ '. 

1. Given /(a?) = x > 0, find, by moment generating function techniques, 
the frequency functioti of 2 == 2«;r. 

2 . Given that £i? is normally distributed and given the sample values' = 42, 
s = 5, n = 20, (a) test the hypothesis that a = 8 , (b) find 98 per cent confidence 
limits for cr^ 

3. Work problem 2(b) for n = 40, using the normal approximation suggested 
in Table III in Appendix 2. 

4. A sample of size 8 from a normal population gave the values 9, 14, 10, 12, 
7, 13, 11, 12. Find 90 per cent confidence limits for cr. 

5. Given the following sample values from a normal population, finh 96 per 

cent confidence limits for based on combining these sample values properly. 
The sample variances are = 25, = 36, ^ 3 ^ = 16, with 5, Wg ~ 5, 



6 . Given that x is normally distributed with mean and variance show 

that the likelihood ratio test of the hypothesis reduces to a test. 

7. Find formulas for the mean and variance of a x^ variable with v degrees of 
freedom by integration. 

8 . Show that 0 ^ 2 ^ = where =* '^xf‘ln and x is normally distributed 
with 0 mean and variance Note that s'^ here is not the customary sample 
variance because the true mean is known. Use the results of problem 7. 

9. Find what value of k will mdkt E{ks^ — a minimum, where is 
defined as in problem 8 . What does this result imply about the unbiased estimate 

of G^ with respect to best estimates ? Use the results of problem 7. 

10 . Determine what value of k will minimize Elkt^ix^ x)^ — G^'f li x is 
normally distributed with mean // and variance 

11. For the data of problem 2, (fl) test the hypothesis = 45 and^^) find 
99 per cent confidence limits for jw. 

12. Given x = 20, s =4, n = 10, with x normally distributed, find 95 per 
cent confidence limits for /<. 

13. Compare the confidence limits obtained in problem 12 with thpse that 
would haye been obtained if s had been treated as the true value of u and normal 
curve methods of Chapter 6 had been employed. 

14. Work problem 4 for rather than g, ' \ ' 

15. Show that Elr] = 0 for Student’s ^ distribution. 

16. The following data give the corrosion effects in various soils for coated 



294 


INTRODUCTION TO MATHEMATICAL STATISTICS 


and uncoated steel pipe. Taking differences of pairs of values, test the hypothesis 
that the mean of such differences is 0. 


Uncoated 

42 

37 

61 

74 

55 

57 

44 

55 

37 

70 

Coated 

39 

43 

43 

52 

52 

59 

40 

45 

47 

62 


Uncoated 

52 

55 

60 

48 

52 

44 

56 

44 

38 

47 

Coated 

40 

27 

50 

33 

56 

36 

54 

32 

39 

40 


17. Given 2 random samples of sizes 10 and 12 from 2 normal populations 

with — 20, = 24, = 5, = 6, (a) test the hypothesis = ^2 

(b) find 95 per cent confidence limits for jui — (^ 2 ^ assuming that = erg. 

18. Work problem 17(^31) without assuming that cr^ = Cg and compare your 
result with that for problem \l{d), 

19. Treating the data of problem 16 as random sample values from 2 normal 
populations rather than as paired values, test the hypothesis 

Explain why it is probably incorrect to apply this test to this problem. 

20. The following data give the gains of 20 rats, half of which received their 
protein from raw peanuts and half of which received their protein from roasted 
peanuts. Test to see whether roasting the peanuts had any effect on their protein 
value. 


Raw 

61 

60 

56 

63 

56 

63 

59 

56 

44 

61 

Roasted 

55 

54 

47 

59 

51 

61 

57 

54 

62 

58 


21 , In an industrial experiment a job was performed by 30 workmen according 
to method I and by 40 workmen according to method 11. The following data 
give the results of the experiment. Determine by means of 95 per cent confidence 
limits for — //g how much time on the average could be expected to be saved 
by using method 1. 


Time 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

I 

1 

3 

5 

4 

7 

5 

3 

1 

1 

0 

0 

II 

1 ^ 

1 

2 

5 

8 

9 

6 

3 

3 

1 

2 


22. In estimating the mean of a normal population by means of a confidence 
interval, how large a sample is needed so that the length of a 95 per cent con- 
fidence interval will be less than or/lO if a is known. 

23. Prove that the likelihood ratio test of the hypothesis = /^o ^ normal 

population of unknown variance is equivalent to Student’s / test for this 
hypothesis. 



SMALL SAMPLE DlSTRItttJTtdm " 295 

24. Prove that the frequency function of the variable t approaches the fequency 
function of the standard normal variable as the number of degrees of freedom 
V becomes infinite. Assume that the constant approaches 1 / V27r. 

25. Find a likelihood ratio test for testing the hypothesis = /4 two 

normal populations with a common variance Assume equal size samples are 
taken from the two populations. - 

26. For the data of problem 9, Chapter 7, find 95 per cent confidence limits 

for the slope P of the theoretical regression line. j! 

27. Samples of sizes 10 and 20 taken from two normal populations gave == 
12 and = 18. Test the hypothesis Hq'.Oi — 

28. The following table gives data on the hardness of wood storecj outside 
and inside. Test to see whether the variability of hardness is affected by weather- 



Outside 

Inside 

Sample size 

40 

100 

Mean 

117 

132 

Sum of squares 



about the mean 

8,655 

27,244 


29. If one desires to have a = .05 and = .05 in testing the equajity of 2 
normal variances when actually one variance is twice the other, how large an 
equal size sample from each population should be taken if the right taiT of 
the F distribution is chosen as the critical region? 

30. Verify the .05 value of Ffor = 2 and = 2 by direct integration of the 

frequency function of F. I 

31 . Derive a formula for obtaining confidence limits for where; and 

are the variances of 2 normal populations, if samples of sizes and « 2 » 

respectively, are taken from those populations. ; . 

32. Find 90 per cent confidence limits for ojoy if 20 samples are taken from 
each of 2 normal populations and if = 3. 

33. The time a; between recordings of certain types of radiation activity is 

known to have the frequency function f(x) = x > 0. How would you 

proceed to construct a test of the hypothesis that the values of a for two <|tifferent 
experiments are the same? 

34. Prove that the variable with v degrees of freedom is a special case of 

the variable F with Vj = i and 1^2 ~ 1- 

35. Criven samples of sizes and Wg, respectively, from 2 normal populations 

with zero means and variances and construct a likelihood ratio; test for 
testing and show that it is equivalent to an Ftest for this problem. 

36. Show that FTF] = — 2) for the F distribution. 

37. If f{x, y) = os >0,y '^0, find the frequency function of la) z = 

00 'f- y, (b) z — Sketch the sample space to obtain the proper limits of 

integration. 



296 


INTRODUCTION TO MATHEMATICAL STATISTICS 


38. Giv^n f{x,y) =2(1 4* a? + > 0, find the frequency function 

of = ic 4- 2 /, 

39. Prove that if x and y are independent standard normal variables, then 
z = yjx has a Cauchy distribution. 

40. If f(x, 2/)=l, 0 <x ^ 1, 0 ^y ^ I, find the frequency function of 
(a) z = x^, (b) z = X y, (c) z = yjx. Sketch the sample space to obtain the 
proper limits of integration. 

41 . If a; and y are independent standard normal variables, derive the frequency 
function of z = Vx^ y^. The variable z represents a radial error in gunnery 
problems in which x and y represent independent coordinate axes errors with 
equal variability. 

42. Find boundaries for a quality-control chart for controlling variability if 
samples of size 5 are taken every hour and if it is known from past experience 
that = 10. Use boundaries that will include 98 per cent of the sample values 

5 

of the variable 2 i^i “* 

1 

43. Find the frequency function of JfJ if a? has the frequency function /(a?) = 
x>0, 

44. Find the probability that in a sample of size 10 from the horizontal 
distribution f(x) = 1, 0 ^ cc ^ 1, the range will exceed .8. 

45. Deterniine how large a random sample must be taken from the horizontal 
distribution f(x) = l,0<x<l in order that the probability will exceed .95 
that the range will exceed .90. 

46. Suppose that samples of size 4 are taken from the distribution f(x) = e~^, 
X ^0. (a) Calculate the mean value of R. (b) Calculate a and then compare the 
ratio E(x)la with that given by Table 4 for a normal variable, (c) Determine 
limits jRj and for R such that P{R < Ri} = .025 and P{R > R 2 } = .025. 

47. For a control chart for the sample mean of a normal variable with ju, = 40, 

based bn of size 5, find control boundaries in terms of the range. 

48. For problem 33, Chapter 6, find control boundaries in terms of the mean 
of the sample ranges. 

49. The random samples a?i, x^^ and t/i, Vn^ are taken from 

two standard normal populations. Find the frequency function of the variable z 

«i / /nj \ 

where z = T / (2 + T I • 

50. For a fixed value of x, find the distribution of the random variable = 
y ■]r b{x — x) in 11.5.3. 

51. Use the results in problem 50 to construct a Student t variable for 

by means of which one can obtain confidence limits for E[Yy,']. Assume inde- 
pendence of certain variables if necessary. 

52. Use the general method for finding the distribution function of a function 
of random variables to obtain a formula for the frequency function of 2 : = 
x^jix^ 4- 2 /^), given that x and y are independent continuous variables with the 
same frequency function f and can assume any real value. 



C H A P T E R 12 


Statistical Design in Experiments 


It is a common occurrence for experimenters who are unacquainted 
with statistical principles to seek statistical assistance when their experi- 
ments fail to produce the results anticipated by them. In some:; experi- 
ments the data were obtained in such a manner as to exclude any valid 
conclusions of the type desired; in others, there is little that can |)e done 
to extract further information from the data because the experiment was 
not designed with a statistical analysis in mind. Only rarely lare the 
experiments that give valid conclusions as sensitive as they woiild have 
been if a standard statistical design had been employed. Tod many 
experimenters do not seem to appreciate the obvious injunction that the 
time to design an experiment is before the experiment is begun. ^ 

In this chapter, after a brief discussion of a few of the general principles 
involved in the design of experiments, some of the common techniques 
used in the design and analysis of experiments will be studied. 

12.1 Randomization, Replication, and Sensitivity 

In most experiments there are several variables in addition to the one or 
more being investigated that need to be controlled if the experiment is to 
give valid conclusions. In some cases these interfering variables ::can be 
controlled by laboratory techniques; in others such control may be pos- 
sible only by statistical design. As a simple illustration, consider an 
agricultural experiment in which two different seed varieties ar§ to be 
tested on a piece of land. If the piece of land were divided into two equal 
pieces and one variety planted on each, the difference in yields could not 
be used as a valid estimate of the differential effect of the two seed varieties 
because of the possible difference in the soil fertility of the two pieces. 

Experiments can often be made valid by applying the princijples of 
randomization and replication. Thus, in the present illustrationj if the 
piece of land were divided into a number of small plots of equal sjjze and 

. : w ■■ ^ 1 



298 INTRODUCTION TO MATHEMATICAL STATISTICS 

if one variety of seed were planted on half of these plots and the other 
variety on the remaining half, with the selection of the plots for each 
variety determined by a random process, then the varying fertility of 
the land would affect the two varieties approximately equally and there- 
fore the difference in varietal yields would represent a valid estimate 
of the differential effects of the two seed varieties. 

Randomization by itself is not necessarily sufficient to yield a valid 
experiment. For example, if one merely tossed a coin to determine which 
half of the original piece of land should be planted with one of the seed 
varieties, the selection would be random but it would not permit the two 
seed varieties to be equally affected by any varying soil fertility. If the 
two seed varieties were equally productive but the two halves of land 
were markedly different in fertility, then regardless of the seed variety 
selected for each half the conclusion would invariably be that the seed 
varieties differed in productivity. In order to insure validity, it would 
be necessary that the piece of land be divided into a sufficiently large num- 
ber of similar plots so that the probability of having one of the seed 
varieties largely located on the more fertile plots would be very small. 
This repetition of an experiment or experimental unit is called replication. 
Thus, to insure validity in an experiment, randomization should be 
accompanied by sufficient replication. 

Not only are randomization and replication useful techniques for 
assisting in the construction of valid experiments, but they are often 
essential to certain classes of experiments whose conclusions depend on 
the use of statistical techniques. Since the frequency functions of the 
various statistics considered in the preceding chapters were derived on the 
basis of random sampling, it follows that the methods employed in 
the preceding chapters are applicable to such samples only; consequently, 
any experiment whose conclusions depend on these methods requires 
randomization. Replication is also necessary for the application of any 
method that obtains its measure of variability directly from the data 
because at least two observations are needed to measure variation. For 
example, the illustrative experiment just discussed requires randomiza- 
tion and replication if the difference between mean yields is to be tested 
by means of Student’s t distribution because the t distribution is based on 
random sampling and because sample variances are needed to evaluate t. 

The requirement of random samples for the applicability of most 
statistical methods is not always easy to satisfy. For example, if the 
product of a machine is sampled every hour for several days, it may easily 
happen that the product of the machine changes during the day because 
of the operator’s working pattern and also from day to day because of 
machine wear. For situations such as this, in which observations are 



. : STATISTICAL 0ESIGN ' IK 

ordered with respect to time, one of the methods for testing randomness, 
such as the method of runs discussed in Chapter 13, should be: applied 
before methods based on random samples are used. 

In the preceding illustration the techniques of randomization and 
replication removed much of the danger of obtaining biased ’results; 
however, these techniques did not remove the effect of differences in soil 
fertility on the variability of yields. If the variation in fertility is increased, 
the variation in yield is thereby increased. As a consequence, if Student’s 
t distribution for testing the difference between two means were applied, a 
considerably larger sample might be needed to produce a si|nificant 
difference if large fertility differences existed between plots than if the 
plots were of uniform fertility because of the larger estimate of variance 
involved in the denominator of t. Such an experiment could therefore 
be made more sensitive by selecting plots of uniform fertility. Very often, 
however, it is not feasible to control the fertility in this mannefj Now, 
by arranging the plots into small homogeneous groups and Applying 
statistical design, it is often possible to eliminate statistically the greater 
share of the fertility variability effects in the t test and thereby rffake the 
experiment more sensitive. 


12.2 Analysis of Variance 

One of the most useful techniques for increasing the sensitivity of ah 
experiment is designing it in such a way that the total variation of the 
variable being studied can be separated into components thaf are of 
experimental interest or importance. Splitting up the total variation in 
this manner enables the experimenter to utilize statistical methods to 
eliminate the effects of certain interfering variables and thus to increase 
the sensitivity of his experiment. The analysis of variance is a technique 
for carrying out the analysis of an experiment designed from this point 
of view. i; 

In designing an experiment, the experimenter usually has in rt(ind the 
testing of a hypothesis or the estimation of some parameters. Although 
the analysis of variance technique enables the experimenter tp design 
sensitive experiments for either of these basic problems, the explanation 
of the technique is made largely from the point of view of testing 
hypotheses. 

As an illustration of the type of problem for which the analysis of 
variance is useful, consider a gunnery experiment in which four different 
brands of shells are to be tested to see whether they are equally satisfactory 
in quality. The experiment consists of having six different markshien fire 



300 


INTRODUCTION TO MATHEMATICAL STATISTICS 


an equal number of rounds with each brand of shells and recording the 
scores made by each marksman for each of the brands. These scores 
may be arranged in a rectangular array containing six rows and four 
columns; however, for the purpose of considering other problems also, 
let the scores be displayed in a rectangular array containing a rows and 
b columns as shown in Table 1. 


Table 1 


^11 

^12 

*' ^15 ■ 



^21 

^22 * 

X^j 


^ 2 - 

^il 

^*2 ' 

■■ ■ 

* ‘ ^ib 


^al 

^ a2 • 

• ■ * ' 







X 

^•1 

X.^ 

■; •• 

• X.j, 


The entries in the margins of the table represent the means of the cor- 
responding rows and columns. The location of the dot in the index 
shows whether the mean is a row mean or a column mean. 

Two well-known mathematical models are available for application 
to experiments of the type being discussed. One of them is called the 
“linear hypothesis” model; the other is known as the “components of 
variance” model. The essential difference between the two models lies 
in the assumptions made concerning the population of experiments of 
which the given experiment is considered to be a random sample. Thus 
the in Table 1 are treated as a set of ab random variables, for which 
the observed values are the values resulting from a single random 
experiment. 


12.2.1 Linear Hypothesis Model 

This model assumes that the random variable % has a mean which 
can be written in the form 

( 1 ) — Ui + b^ + c 

where c denotes the expected value of x, denotes the expected value of 
x^. — and bj denotes the expected value pf x.j — x. Since x is the 
sample mean of both x^, and x,^, 

a & 

2 (Sj. — «) = 0 and 2 i — ») = 0 

i=i i=i 


STATISTICAL DESIGN IN BXt^ERlMEKTS j. 3Q2 

Upon taking the expected value of each of these sums, it therefore follows 
that 

( 2 ) 2 == 0 = ^ 

i~l ■ j = i ■ ■ ■j: • 

Assumption (1) essentially states that the mean of the variable % is the 
sum of a general mean c, a row effect a^, and a column effect Z?,’ Thus, 
in the gunnery experiment, if the ith marksman were a superior marks- 
man, his mean score would be expected to exceed the mean scoffe for all 
six marksmen by a positive amount, whereas if he were an inferior 
marksman, would be negative. Similarly, is a number, positive or 
negative, that measures the superiority or inferiority of brand y with 
respect to the brands being tested. Assumption (1) is more restrictive 
than might appear at first glance, because in many practical problems it is 
unrealistic to assume that the two variables of classification have their 
effects additive in this simple fashion. For example, if the rows of Table 
1 corresponded to different amounts of a chemical compound added to 
the soil, whereas the columns corresponded to different amounts of a 
second chemical compound added, one would not expect the effects of 
these compounds on crop productivity to operate independently in this 
manner,:: „ : 

In addition to assumption (1), the linear hypothesis model ’^ssumes 
that the variables are independently and normally distributed with 
the same variances cr^. 

Since the analysis of variance is being introduced as a technique for 
increasing the sensitivity of an experiment for testing hypotheses, insider 
the problem of testing the hypothesis that the theoretical column means 
of Table 1 are equal* For the illustration of marksmen and shell "brands, 
this would mean testing the hypothesis that the four brands of sl(iells are 
equally good, that is, that == * * * — This hypothesis is a 

generalization of the hypothesis == 7 /^ considered in Chapter-' 11. In 
terms of the notation introduced in ( 1 ), it follows from ( 2 ) that the hy- 
pothesis can be written in the form 

(3) i/o:^ = 0 (j=\,2,---,b) ;• 

Under the foregoing assumptions and notation, the analysis of variance 
technique proceeds as follows. Write the total sum of squares oif devia- 
tions of the variables from their sample mean x in the following form: 

i i :: 

4=1 ?=1 ^ 

= 2 2 [(^- - *) + - *) + i^i} - - *.r+ x)'f 

1=1 



302 


INTRODUCTION TO MATHEMATICAL STATISTICS 


If the trinomial on the right is squared and summed term by term, it will 
be found that the sums involving cross-product terms vanish, hence that 

(4) 1 2 (*« - *)^ = i 2 (»i. - xf 

i = l 0 = 1 i = l 0 = 1 

+ 22 i +22 (*i3- - 

i = l 3=1 i = l 0 = 1 

As a partial verification of the fact that the cross-product terms do 
vanish, consider the evaluation of the second cross-product term. It is 
convenient to sum with respect to j first; thus 

a b 

2 2 (*<• - - *.j + *) 

i = l o' = l 

a b 

= 2 (^i- - *) 2 (*« - + *) 

i=l 0=1 

But, summing term by term, it is clear from Table 1 that 

b 

2 — x^. — x.j + x) = b\ — bx^. ^ bx + bx = 0 

i = l 

Formula (4) shows that the total sum of squares can be broken down 
into three components, the first component measuring the variation of 
row means, the second component measuring the variation of column 
means, and the third component measuring the variation in the variables 
x^j after the row and column effects have been eliminated. 

The purpose of the breakdown in (4) was to separate the total variation 
into components that are of experimental interest and that can be used 
in a significance test using the F distribution of Chapter 11. It will turn 
out that the F value to use involves the ratio of two of the three sums of 
squares on the right side of (4). It is clear that the second sum of squares 
on the right should be used in the test because it measures the variation of 
column means and this variation is likely to be excessively large when Hq 
is not true as compared to its value when //q is true. The last sum of squares 
should also be selected because it measures the variation of the x^^ after 
the variation due to row differences and column differences has been 
eliminated, and therefore it should prove useful as a basis for comparison 
for the second sum of squares. This technique of finding a measure of 
variation that has eliminated the effect of an interfering variable, such as 
plot fertility, and using it as a basis for comparison with the variation of 
experimental interest, is a technique that often increases the sensitivity 
of the experiment remarkably. With the selection of these two sums of 
squares, the problem of testing is reduced to the problem of deter- 
mining how to apply the F distribution to these two sums of squares. 



STATiSTiCALTlDmmM 303 

Consider, therefore, the method of converting these sums of squares into 
variables. 

The variable x.j is a normal variable because it is a linear confbination 
of the basic variables which are assumed to be normal. The mean of 
x.j, because of (1) and (2), is given by 

/I a 

£(*.,) = £ i 2 

\a i=i 

Cl i — l . 

= - 2 («i + bj + c) 
a i-i 

— bj + c ■ : 

But when Hq is true, it follows from (3) that E(x,j) = c. The vatianc0 of 
x.^ may be found by realizing that x.^ is the mean of a independent vari- 
ables having the same variances a^. Thus the variance of x.^ is fcqual to 
a^ja. The variables x.^ are independent because the % are independent; 
therefore, these results show that the variables are independently and 
normally distributed with the same means, c, and the same variances, 
a^ja, when Hq is true. By Theorem 1, Chapter 1 1, it therefore follows that 

(5) 2 = i i 

a ja i=i , ,, 

will possess a distribution with b —■ 1 degrees of freedom. Thi§ proves 
that the second sum of squares on the right of (4), when divided by g\ 
possesses a distribution with b — 1 degrees of freedom, provided that 
Hq is true. 

The demonstration that the last sum of squares of (4) can be coinverted 
into a variable is considerably more difficult than that just given for 
the second sum of squares. Because of the length and difficulty of the 
demonstration, the desired result is accepted without proof. Tlius it is 
accepted that 

(6) i 2 ~ 

a i=x j-i 

possesses a x^ distribution with {a — 1)(Z7 — 1) degrees of freedpiti. The 
reason for this number of degrees of freedom is that in the derivation 
showing that (6) has a x^ distribution it is shown that the degrees of free- 
dom on the left of (4) equal the sum of the degrees of freedom on the right. 
Since the left side of (4), when divided by would possess a x^ distribu- 
tion with ab — 1 degrees of freedom if the were equal and since the 



304 INTRODUCTION TO MATHEMATICAL STATISTICS 

first sum on the right has a \ degrees of freedom, it follows by subtrac- 
tion that the last sum on the right must have 

ab-\^{{a-\) + {b- I)] = ^ 1) (i - 1) 

degrees of freedom. 

Finally, in order to be able to apply Theorem 4, Chapter 11, to (5) 
and (6), it is necessary to know that (5) and (6) are independently dis- 
tributed. The demonstration of this fact is quite difficult; hence the 
independence of (5) and (6) is also accepted without proof. 

In view of the preceding discussion and Theorem 4, Chapter 11, if 
(5) is divided by (b — 1) and (6) is divided by (a — \){b — 1), the ratio 
of the resulting quantities will possess an F distribution. This result may 
be summarized in the following manner. 

(7) Linear Hypothesis F Test: If the variables are independently and 
normally distributed with means = a^ + b^ + c and variances the 
hypothesis ^ 0 {j — 1, • • * , i) may be tested by using the right tail 

of the F distribution as critical region, where 

(a - 1) 2 2 i 

p _ i = l 3 = 1 

i 2 i^ii - K - *.,• + 

3=1 

and where b — I and Vq = (a — 1) (b — 1). 

The right tail of the F distribution is selected as the critical region 
because the numerator of F is likely to be excessively large when JIq is 
false. With this choice of critical region, the F test is known to be very 
good from the type II error point of view. 

The equality of row means can be tested in a similar manner by using 
the first sum of squares on the right of (4) in the numerator of F and 
changing the degrees of freedom accordingly. Although the numerator 
in (7) can be written as a single sum, it is written as a double sum to remind 
one of the simple manner in which F can be written down. All that one 
needs to do is to write out the fundamental identity (4), divide the sums of 
squares by their degrees of freedom, and take the proper ratio of two such 
quantities. The proper ratio depends on whether one is testing the equality 
of column means or the equality of row means. 


12.2.2 Application of the Linear Hypothesis Model 

For the purpose of illustrating the use of (7), consider the data of Table 
2 on the yield of potatoes. Four plots of land were divided into five 



STATIStICAt DESIGN 


305 


subplots each. For each plot, the five treatments were assigned al random 
to the five subplots. The problem here is to test whether the five treat- 
ments are equally effective with respect to mean yield. 


Table 2 
Treatment 



A 

B 

C 

D 

E 

1 

310 

353 

366 

299 

367 

2 

284 

293 

335 

264 

314 

3 

307 

306 

339 

311 

377 

4 

267 

308 

312 

266 

342 


The numerator sum of squares in (7) is readily computed ‘ directly 
from the means of the columns and the grand mean; however^ the de- 
nominator sum of squares is most easily computed indirectly by comput- 
ing the other sums of squares in (4) and then solving for this sum of 
squares. Calculations here yield the values " 


2 {x.^ - xf = 3178 


1286 


5 = 1 

2 (*i. - 
* = 1 

i i = 21,530 

5=1 


Therefore, by formula (4), it follows that 


2 2 = 2388 

i-l 3=1 


As a result, the F value in (7) becomes 


„ 3-4(3178) 

F = — i == 16.0, v. = 4, v. 12 
2388 ^ ^ 


From Table V in Appendix 2 it is clear that this result is significant; 
therefore, the five treatments undoubtedly differ in their effect on yieid. 

Since the preceding computations give the necessary sums of squares 
for testing the Hypothesis that the row means are equal, that is, fojr testing 
the hypothesis 

= 0, (/ = 1, ■ * * , i?) 



306 INTRODUCTION TO MATHEMATICAL STATISTICS 


this hypothesis will also be tested. The value of F now becomes 


4 ■ 5(1286) 
2388 


10.8, = 3, rg = 12 


This result is also significant, which means that the four plots undoubtedly 
differ in fertility. 

The computational results for analysis of variance problems are usually 
displayed in table form. Table 3 illustrates this type of summary for 
the problem just discussed. 

Table 3 


Source of 
variation 

Sum of 
squares 

d. f. 

Mean 

square 

F 

Value 

Columns 

12,712 

4 

3178 

16.0 

Rows 

6,430 

3 

2143 

10.8 

Remainder 

2,388 

12 

199 


Totals 

21,530 

19 




The entries in the second column are the sums of squares in the funda- 
mental identity (4). The third column lists the corresponding degrees 
of freedom, and the fourth column gives each sum of squares divided by 
its degrees of freedom. These entries are the values needed for the F 
ratios, which are displayed in the last column. 

In order to observe the increased sensitivity obtained by eliminating 
the variation due to differences in plot fertility when testing the hypoth- 
esis that the treatment means are equal, consider how the hypothesis 
would have been tested if the row classification were not available. This 
would be the situation, for example, if the five treatments had been 
assigned to the 20 subplots at random. 

The fundamental identity (4) now reduces to 

2 2 i^ii i + 2 2 

i = l 3 = 1 i = l 3 = 1 i = l 3 = 1 

It is easy to show that the second sum on the right, when divided by cr^, 
has a distribution with b{a — 1) degrees of freedom. Then accepting 
the fact that this variable and the variable given by (5) are independ- 
ently distributed, it follows that the F distribution may be applied to 
give 

b{a - 1) 2 2 i 

(8) F = = v, = b(a-l) 

(X,, - x.,f 

i = 1 3=1 



STATISTICAL DESIGN IN D^FHRIMENTS 3q>7 

The earlier calculations for Table 2 may be used to give the necessary 
values here. It will be found that 


5 • 3(12,712) 
4(8,818) 



^2 



The 5 per cent and 1 per cent critical values are 3.06 and 4.89; hence 
this value is still significant at the 1 per cent significance level |)ut only 
barely so. A comparison of this result with the earlier result in which 
F = 16,0 shows that the segregation of plot differences in (4) gaVe rise to 
a much more sensitive experiment than that obtained by ignoring them. 

The preceding illustration may give one the impression that th| experi- 
menter can choose either one of the two applied there to determine 

whether a set of colunin means is equal. This is not strictly true, Kowever^^ 
because the two models differ somewhat. The earlier problem concern- 
ing men and machines is a good one to illustrate the d^i^^ ffhe test 
based on (7) assumes that six men are selected and each perforins four 
experiments, one with each brand of shells. The test based on (8) assumes 
that 24 men are selected and each performs one experiment with the 
brand of shells assigned him. In the first model the six men co^Id have 
been selected at random, or otherwise, from a population of workers; 
however, it is assumed that in repetitions of the experiment the |ame six 
men are used. In the second model it is assumed that a fresh s^t of 24 
men is selected at random from a population of workers every time the 
experiment is performed. 


12.2.3 Components of Variance Model 

This model makes a linearity assumption about the basic variable 
rather than about its mean, as was done in the linear hypothesis 
model. In place of assumption (1), it is assumed that x,/can be expressed 
in the form 

(9) Xij = Ui + Vj -{-Wii :: 

.!! ■ ■■ 

where the and are independent normal variables. The are 

assumed to possess the same normal distribution with mean and 
variance the are assurried to possess the same normal distribution 
with mean and variance and the are assumed to pofeess the 
same normal distribution with mean and variance n/. From these 
assumptions, it follows that 

+ Fft + 



308 INTRODUCTION TO MATHEMATICAL STATISTICS 

and 

These results, together with (9), show that the variables are normally, 
but not independently, distributed with, the same means and variances. 
The lack of independence is obvious if one compares, say, the variables 
^11 = and = Wi + fg + Wjg. Since and a^ig contain 

the common variable Wi, with the remaining variables on the right being 
independent, they must be correlated. 

In the linear hypothesis model considered earlier, the variables x^^ 
were assumed to be normally, and independently, distributed with the 
same variances but with different means. 

The analysis of variance technique for the components of variance 
model proceeds in much the same manner as for the linear hypothesis 
model. One starts with the same breakdown of the basic sum of squares 
given by (4) and shows that the second and third sums of squares on the 
right can be incorporated into an F test for the hypothesis that there are 
no column effects. However, in the components of variance model this 
hypothesis assumes the form 

(10) i/o:V = 0 

For the purpose of demonstrating that (5) possesses a distribution 
in this model also, use (9) to obtain 

1 a I a 6 

ai=i abi==i3=i 

^ ii + — (u + V + w) 

== (vj + w.,0 - (i; + w) 

Letting it follows that 

1 & _ _ _ 

2 / == 7 2 ^ = v + w 

b ?=i 

hence that 

(11) = 

Since y^ is a linear combination of the variables t?,., • * * , Wa,-, which 

are independent normal variables, it follows that y^ is a normal variable. 
Furthermore, since each of the variables y^, 2 / 2 ? * * ‘ j 2/6 is a linear combina- 
tion of a different set of independent normal variables, it follows that 
the yj(j=^ 1, • ' • , Z?) are independently normally distributed. The mean 
of yj is given by 

^Vi) = == Fb + f^e 



STATISTICAL DESIGN IN EXPEmSffiOT 
The variance of yj is given by 


'309 


ff|.. 


''Vj — ''Vj 


= < + 


* + — 


The preceding results shoAV that the are independently normally dis- 
tributed with the same means, -t- (jt^, and the same variances, 
a ^ja; therefore, by Theorem 1, Chapter II, it follows that the quantity 
corresponding to ns^ja^ for y^, y^, - ' ' , % will possess a distribution. 
Because of (11), this quantity is 

2 ^Vi - yf 2 (*•# 2 2 

(\ 2 ') ^ — - = -■ - — <==1 ^.=1 - . „ 

V + V + aa^ + 

The preceding derivation proves that (12) possesses a distribution 
with b — \ degrees of freedom. This corresponds to (5) for the linear 
hypothesis model. As with the linear hypothesis model, it is considerably 
more difficult to convert the last sum of squares of (4) into a yariable 
and to show that it is distributed independently of (12). The results of 
such a demonstration are accepted here without proof. Thus, it; can be 
shown that ; : 

(13) 2 2 (% - 

.(T„ i = l i = l , 

possessesr a distribution with (a — 1) {b — 1) degrees of freedW and 
that (12) and (13) are independently distributed. From Theorem 4, 
Chapter 11, it therefore follows that the F distribution may be applied 
to (12) and (13) to give 

- 1) 2 2 . 

(14) f = — ^1 .. if , si.,: , — 

(o<r6® -h <r„2) 2 2 (% - % - + ^)^ 

»• = ! ^ = 1 ,' ii , , 

with Vi^ b — 1 and rg = (a — 1) (b — 1). 

It is clear from (14) that the value of F can never be calculated in a 
given problem unless is known. But, when Hq is true, it follows 

from (10) that this ratio is equal to 0 and that the value of F in (14) can 
be evaluated from the data of the experiment. This result may Be sum- 
marized in the following manner. 



310 


INTRODUCTION TO MATHEMATICAL STATISTICS 


(15) Components of Variance F Test: If the variables are express- 
ible in the form := where the and w^j are three 

sets of random samples from three normal populations, the hypothesis 
= 0, where af is the variance for the second normal population, 
may be tested by using the right tail of the F distribution as critical region, 
where 

(a - 1) 2 2 j 

P = A r-l 

a i 

2 2 * J + *)^ 

i=l 3=1 

and where v^^b — \ and {a — \){b — 1). 

A comparison of (7) and (15) shows that the test for the hypothesis 
that there are no column effects is the same for the two models; however, 
the mathematical formulation of this hypothesis is quite different in the 
two schemes, as is evident on comparing (3) and (10). In order to see 
what these mathematical differences in formulation imply experimentally, 
consider once more the experiment suggested in 12.2 of testing four brands 
of shells with six marksmen. 

The linear hypothesis model assumes that the experiment in question 
is a random sample of similar experiments in which the same four brands 
of shells and the same six marksmen are used. As a consequence, if 
is accepted, it implies that there is no real difference in the quality of the 
four brands as far as these six marksmen are concerned. It might happen, 
however, that a different set of marksmen would show up differences in 
the four brands. Thus in the linear hypothesis model the conclusions 
drawn from the F test are strictly applicable only for the given marksmen. 

The components of variance model assumes that the experiment in 
question was obtained by selecting six marksmen at random from a 
population of marksmen and by selecting four brands of shells at random 
from a population of brands. As a consequence, if is accepted, it 
implies that the population of brands consists essentially of one brand 
only, because = 0. In the components of variance model the con- 
clusions drawn from the F test apply to the population of marksmen and 
the population of brands. 

In the present illustration neither model seems to be entirely appro- 
priate. It would be desirable to have a test that is concerned with the four 
chosen brands of shells only, since they alone are of interest, but it would 
also be desirable to have the test applicable to a population of marksmen. 
Thus a mixture of the two models would appear to be the most realistic 
model for this illustration. The preceding methods can be extended to 
cover mixed cases also, but such an extension is not considered here. 



STATISTICAL' 

12.2.4 Analysis of Variance Estimation 

Although the analysis of variance technique was presented as a tech- 
nique for testing hypotheses, it is also a very useful tool for bjbtaining 
estimates of the various parameters, or functions of the parfmeters, 
involved in the two models. j 

Consider, first, the problem of estimating the parameters Involved 
in the linear hypothesis model. From (1) the parameters c, and 
were defined as the expected values of the corresponding random variables 
X, \ and x.^ — x\ hence these random variables yield unbiased 
estimates of those parameters. Thus, using a circumflex to denote an 
unbiased estimate, 

c = x, di=^x^:—x, = x,j — X 

An unbiased estimate of is given by 


(16) 


ah 

2 2 (^n - 

^2 _ i = l J = 1 

(a_l)(6_l) 




A demonstration of the fact that this estimator is unbiased can Be based 
on (6). It was accepted that (6) possesses a distribution with — 1) 
{b — 1) degrees of freedom. From problem 7, Chapter 1 1, it is known that 
th0 expected value of a x^ variable is equal to its degrees of freedom; 
hence the expected value of (16) must be cr^, The fact that (iB) is an 
unbiased estimator of cr^ could be demonstrated without the aid of (6) 
by using expected value operator methods. ^ 

The preceding estimators may be used to give unbiased estimates of 
interesting functions of the parameters. For example, an experimenter 
interested in estimating the difference between two treatment effects 
corresponding, say, to the first and second column effects, could | use the 
estimator — ^.2 

Consider, next, the problem of estimating the parameters involved 
in the components of variance model. The parameters here are:ja„, 7 ^^, 
and Cf?. In most applications of this model the experimenter’s 
estimation interests are usually centered on the total mean and tlie indi- 
vidual variances and not on individual means. The total mean, namely 
E(Xij) = + Wj, + obviously can be estimated by x. Estimates for 

the individual variances can be obtained by using the appropriate sums 
of squares listed in Table 4. 

The last column gives the expected values of the entries in the fourth 
column. These values can be verified by using expected value operator 



312 INTRODUCTION TO MATHEMATICAL STATISTICS 


Table 4 

Source of Sum of . « Mean Expected 

variation squares ' square mean square 


Rows 

£c)^=*S'i 1 

Sj(a-l) 

ao‘+ba^‘ 

Columns 

'Z'L{x,i-xY=S^ b-\ 

SJ(b-n 


Remainder 

= S* 

S,Ka-m-l) 


methods. 

For example, the second entry in 

the last column 

can be 


obtained as follows : 


E(S^) = Ef i 

i = l 3 = X 

^ f 

— aE ^ (x.j^ — x)^ 

3 = 1 

But from (11) this is equivalent to 

(17) E(S,) = uE i (t/, ^ yf 

3 = 1 

From the discussion following (11) it is clear that the are independent 
normal variables with the same means and variances, namely, /x^ 4- fic and 




a 


h 

As a consequence, 2 iVo “ 0 will be an unbiased estimate of 

this variance; that is, 


;=i h — 1 a 


Using this result in (17) yields 

£(S^) = a{b - 

The expected value of the second entry in the fourth column of Table 4 
is therefore given by 

£ — ^2 — _ ^^ 2 ^ ^ 2 

(^ - 1) ' ' 

The first entry in the last column of Table 4 follows by symmetry, 
whereas the last entry can be verified by methods similar to those employed 
in the preceding demonstration. 



STATISTICAL DESIGN IN EXPERIMENTS ;; 313 

From Table 4 it now follows that unbiased estiTnates of o'/, and 
a/ are given by t' 

^ . . . 

“ Ka - 1) 

/r 2 _ * ^2 1 ) 

a(b-Vi T 

The accuracy of the various estimates obtained in this section has not 
been discussed because some of the theory relating to accuracy is some- 
what incomplete and lengthy. 


12.2.5 Generalizations 

Both the linear hypothesis model and the components of variance 
model can be generalized to cover situations in which there are more 
than two variables of classification. Thus, if there were three variables 
of classification, the fundamental identity (4) would assume the form 

( 18 ) - *)* + ;i 

- 

§- *)^ 

-)■ “ 

The general theory for the linear hypothesis model shows tfat one 
proceeds in the same manner as before. Thus it is merely necessary 
to divide each of the preceding sums of squares by its degrees of freedom 
and then take the proper ratio, depending on the hypothesis to be tested, 
to obtain an F variable. For example, if the equality of row means were 
being tested, one would choose the first and last sums of squares on the 
right of (18) to form the F ratio. The new feature in (18), not found in 
(4), is that now there are sums of squares that measure the interaction 
between two variables. Thus the fourth sum of squares on the right 
measures the extent to which the first and second variables interact on 
each other. If, for example, different amounts of two different chemicals 
were applied to experimental plots of ground, it might happen that in- 
creased amounts of each chemical alone would increase yield but that 
when both chemicals were applied equally no appreciable increase in 
yield would result. 



314 


INTRODUCTION TO MATHEMATICAL STATISTICS 


For the purpose of seeing how the fourth sum of squares is capable of 
measuring the interaction of the first two variables, consider the expres- 
sion 

(*« - K) - - *) = ya - y.) 

This quantity is the typical term, before squaring and with the third 
variable dots omitted, in the sum of squares being discussed. If the row 
effects were strictly additive, then subtracting the row mean from 
every cell entry, in a two-way table would yield a set of observational 
values, that are random except for column effects. The sample vari- 
ance of these adjusted values in any given column would therefore yield 
an estimate of the basic variance cr^. Since y,j is the column mean of the 

a 

it follows that 2 “ V jT’ when divided by the appropriate number 

of degrees of freedom, would be expected to be a valid estimate of cr^ 
and therefore that the double sum when divided by the proper number 
would yield such an estimate also. 

Now suppose that the two variables do not act independently of each 
other in this additive fashion and that, say, the first row variable is bene- 
ficial in conjunction with the first column variable but harmful otherwise. 
Then the value of y^ == — x^, would be expected to be larger than 

under independence, whereas the values of the y^^ for y = 2, • • • , 6 
would be expected to be smaller (larger negatively) than under inde- 
pendence. If the other row variables also interact in various ways with 
the column variables, the net effect will be to produce sets of y^^ in the 
various columns that are more variable than under independence. As 
a result, the sum of squares being discussed will tend to be larger when 
interaction is present than when it is absent. The general theory of analy- 
sis of variance shows that a valid test of the hypothesis that there is no 
interaction between the first and second variables is given by applying 
the F test to the fourth and last sums of squares in the usual manner. 

An analysis of variance design in which there are two variables of 
classification but in which there are k observations in each cell can be 
treated as a special case of a three- variable problem. Since the index k on 
x^j^ corresponds to a ^th replication rather than to a third variable, the 
fundamental identity (18) would be rewritten to eliminate terms involving 
a segregation for the third variable. Thus the third, fifth, and sixth terms 
on the right would be combined with the seventh term to give a new 
remainder term. The breakdown would then become 

(19) = 12m- - 

+ yyyi^ii- - *i- - *.i. + 



STATISTICAL DESIGK IN EXPERIMENTS 315 

■■■ 

The analysis then proceeds as usual. It is now possible to test fpr inter- 
action between the two variables, whereas when there was but one obser- 
vation per cell, as in (4), this was not possible. 

The general theory for the components of variance model shows that 
the nice generalization, just discussed, that holds for the linear ‘hypoth- 
esis model does not apply to the components of variance modeh When 
there are three or more variables of classification, it is not possible to 
apply the F test as usuarto te that there are no row differences. 

The F test is applicable to this problem only if it can be assumed that one 
of the theoretical interactiohs inv the row variable is zero; other- 
wise, it is necessary to estimate an unknown parameter from the c!ata and 
thus use an inexact method. : 

The material presented here on the analysis of variance is .‘only an 
introduction to this important topic. Books on experimentai design 
discuss many other models and generalizations and give detailecl discus- 
sions of applications. 


12.3 Stratified Sampling 

The technique of breaking down the variation of a variable into useful 
components in order to decrease the experimental variation, as done in 
the analysis of variance, can also be used to advantage in designing 
experiments for estimating means of populations. It turns out that a 
more accurate estimate of the mean can often be obtained by taking 
restricted random samples than by using completely random samples. 
For example, suppose that an accurate estimate of the mean weight of 
fifth grade pupils is desired by a school system. By taking the proper 
size random samples in the various age groups, or in the variouk schools 
of the system, a more accurate estimate of the population mean can 
usually be obtained than by taking the same total sample at random in 
the system. In order to determine the proper size subsamples, consider 
the following general problem. 

Let a population be divided into k distinct subpopulations, jFurther, 
let the mean and variance of this population be (jl and and of the ith 
subpopulation, Fi and Then consider as estimates of ^ the quantities 
X and Xj^, where x is the mean of a random sample of size « ahci where 

(20) :i 

n n 

in which x^ is the mean of a random sample of size drawn froni the ith 



316 


INTRODUCTION TO MATHEMATICAL STATISTICS 


h 

subpopulation and = n. This restricted type of random sampling 
1 

is called stratified sampling. 

For the purpose of comparing the relative precision of these two 
estimates of fi, consider their respective variances. The variance of x 
is given by = a^jn. Since the are independent, the variance of (20) 
is given by 



In order to express the variance of x in terms of the o'/, it is necessary 
to express the frequency function of the population in terms of those of 
the subpopulations. This may be done by applying the two basic rules of 
probability to the problem of determining the probability that x will 
assume a value within any specified interval. If denotes the probability 
that X will come from the zth subpopulation and fi{x) denotes the fre- 
quency function for this subpopulation, 



represents the probability that x will come from the ith subpopulation 
and will assume a value between a and /?. Since these subpopulations are 
mutually exclusive, the probability that x will assume a value between 
a and /S is the sum of all such probabilities; hence 


fix) dx = Pi 


fiix) dx+ ■■■ + p„ 



dx 


But a and p are arbitrary; consequently by the same reasoning followed 
in (2), Chapter 8, 

fix) = Piffx) + • • • + p„ffx) 

Now 

rb i^b rb 

(22) [A, = xf{x) da; = Pi dx + • • • + pj,\ xfj^x) dx 

= Pl/Wi + * * ‘ + P*/ifc 

Furthermore 



= + • • • 



STATISTICAL DESIGK"IN'''I»M!Mi#re‘'’ ''' """ '"r' '“IIT 

, |r- : 

If the value of fi^ is eliminated by means of (22) and the fact that = 1, 

this reduces to ? 

= 2 + Oi - pf] 

1 

.. ir . . 

From this result it follows that the variance of x can be writteii in the 
form 

(23) Pi[(T/ + iHi - nf'\ 

n 1 

Now consider a special type of sampling called representative sampling 
in which the subpopulation sample sizes n^ are chosen so that n^jn = p^. 
For a finite population this means that the relative sizes of the subpopula- 
tion samples are chosen equal to the relative sizes of the subpopulations. 
For representative sampling, (23) may be reduced by means of (21) to 
the form 


(24) = O'®/ + 2 (-“i - 

in 

This shows that tr/ > or^^, unless the subpopulations have equal means. 
Representative sampling is of particular advantage for populations 
whose subpopulations have widely differing means. jj 

Public opinion polls are familiar examples of representative sapipling. 
For such polls it is customary to stratify the population in several ways. 
For example, it may be divided into several income groups, into several 
vocational groups, etc. Then, within strata, random samples are taken 
proportional to the relative sizes of those strata. 

Various other types of restricted random sampling are available, most 
of which have been developed by governmental and industrial agencies 
for their particular needs. 

As an illustration of the increased precision of estimating ' by the 
use of representative sampling, suppose for the sake of simplicity that a 
district is made up of 45 per cent Democrats and 55 per cent Republicans 
and that 70 per cent of the Democrats will vote for a certain ‘‘nonpartisan” 
candidate in a primary election but only 20 per cent of the Republicans 
will do so. Now suppose that a sample of size 200 is taken by each rhethod. 
Although experience indicates that the precision of poll percentages is 
not so great as that given by binomial theory, the precisions here -will be 
compared on a theoretical basis ; consequently 


, 2 , 2_. W .(.425)(.575) 

— ■ 


n 

.00122 


n 



318 

and 


INTRODUCTION TO MATHEMATICAL STATISTICS 


i (-70 - .425)^ + (.20 - .425)* 


1 n 


Therefore, from (24) 


( 200 )* 
= .00031 


( 200 )* 


ffj * = .00091 


Since or^^^/or/ = .75 here, a considerable increase in precision would 
result from using representative sampling in preference to pure random 
sampling. Formulas (23) and (24) are valid for discrete variables. 


12.4 Sampling Inspection 

The discussion thus far in this chapter has been concerned with tech- 
niques for designing valid experiments and for increasing the sensitivity 
of such experiments. In the remainder of this chapter the emphasis will 
be on a method that directly tries to minimize the amount of sampling 
needed to attain a desired sensitivity in the experiment. Since a smaller 
size analysis of variance experiment usually suffices to attain the same 
sensitivity as a corresponding more elementary design, the analysis of 
variance can be considered indirectly as a technique for decreasing the 
amount of sampling needed to attain a desired sensitivity. Thus all the 
techniques of this chapter can be thought of as techniques for decreasing 
the amount of sampling needed to attain the desired objective. 

One of the most useful applications of the design of experiments to mini- 
mize the amount of sampling occurs in industrial sampling inspection. 
If a certain type of sampling procedure is agreed on, the notion of the 
two types of error can be used to advantage to design a good inspection 
procedure. 

It is a common practice in industry to accept or reject lots of merchan- 
dise on the basis of a sample drawn from the lot. This practice arises 
from the fact that it is often more economical to tolerate a small percent- 
age of defectives than to bear the cost of 100 per cent inspection. The 
basis for accepting a lot of merchandise usually consists in specifying 
the maximum number of defective pieces that will be tolerated in a random 
sample of a given size. By means of such samples and specifications the 
purchaser is protected against receiving bad lots of merchandise. 

Sampling inspection is quite different from quality control. It is a 
method for protecting the purchaser against poor quality after the prod- 
uct has been manufactured rather than a method for finding and correct- 
ing flaws in the manufacturing process, as in quality-control methods. 



STATIStrCAL" besicH 

• • ■ ii ■■ ■;■■ ■■'•'■ '■'■ — 

When sampling inspection methods are applied to continuous mahhfactur- 
ing processes, however, they are often useful in helping to control the 
quality of the product. 

From the consumer’s point of view, there is a maximum percentage 
of defectives that he will tolerate. This percentage when expressed as a 
decimal is known as the lot tolerance fraction defective and is denoted % 
pi. Without almost 100 per cent inspection, it may be impossible to be 
certain that the quality is better than p^; however, it is possible to|:set up a 
sampling procedure that will insure this quality with a certain probability. 
To this end consider a lot of iV pieces from which a random sanijple of n 
pieces is to be selected. Let c denote the maximum number of defective 
pieces that will be tolerated in the sample if the lot is to be accepted. 

Although numerous sampling schemes are available, only one common 
type of sampling procedure, known as single sampling, is considered here. 
This scheme proceeds as follows: 

(25) (i) Inspect a sample of n pieces. 

(ii) If the number of defective pieces does not exceed c, 
accept the lot; otherwise, inspect the entire lot. ■: 

(iii) Replace all defective pieces found by nondefective pieces, 

■ ii; ■ ■ 

Now consider the computation of the probability that the consumer 
will receive a bad lot under this sampling procedure, where bad isidefined 
to be quality worse than p^. Suppose that the lot being submitted for 
inspection is of p fraction defective so that it contains exactly Np defective 
and N ^ Np nondefective items. Then the probability of obtaining 
exactly x defectives in a sample of size n is given by the ratio of the humber 
of ways of choosing x things from Np things and n -- x things from 
N — 7^ things to the number of ways of choosing n things from N things. 
Using the hypergeometric distribution given in (36), Chapter - 5, this 
probability, P{x), may be expressed in the form. li 



Under the sampling scheme (25), the consumer will accept a bad tot pro- 
vided that p > Pt and that x < c. Assuming, therefore, that p > Ptj the 
probability that the consumer will accept a bad lot is given by sumrtiing the 
probabilities, given by (26) for x = 0, 1, • • • , c. Now it can be 

shown that 2 P{^} decreases as p increases ; consequently the probability 

a :=0 

that the consumer will accept a bad lot will be a maximum when p is as 



320 


INTRODUCTION TO MATHEMATICAL STATISTICS 


small as possible. But p cannot be less than if the lot is to be judged as 
bad; hence the probability that the consumer will accept a bad lot cannot 
exceed the value 



This value is given a special name. 


(28) Definition: The probability that a consumer will accept a lot 
of fraction defective where p^ is his lot tolerance fraction defective^ 
is called the consumer's risk. 

By demanding a small value of the consumer is protected against 
poor quality. If the actual fraction defective p is smaller than p^, the 
consumer will be satisfied with the lot, whereas if it is larger than p^ he 
will wish to reject the lot, and the probability that he will fail to do so 
will not exceed P^, Thus P^ is a conservative estimate of the probability 
that he will accept a bad lot. 

From the producer’s point of view any sampling scheme for deciding 
on the quality of a lot possesses the disadvantage of occasionally rejecting 
a lot of satisfactory quality. Most producers, however, are concerned 
principally with the percentage of the lots that are likely to be rejected, 
whether those lots deserve to be rejected or not. Thus the producer is 
interested in knowing what the probability is that a lot of N selected at 
raildom from his production line will be rejected. In order to calculate 
this probability, assume that the manufacturing process is under control 
with a process fraction defective p. This means that individual items coming 
off the production line may be considered to be random samples from a 
binomial population for which the probability that an item will be defec- 
tive is p. From this point of view selecting a random sample of size N 
from the production process and then selecting a subsample of size n 
from the sample of N already selected is equivalent to selecting a random 
sample of size n directly from the production process. Since a lot will be 
rejected only if cc > c, it follows that the probability that a lot will be 
rejected is equivalent to the probability of getting more than c successes 
in n trials of an experiment for which p is the probability of success in a 
single trial. The desired probability is therefore given by 


(29) 


2 T7 Ti * 

aj=c + l X\ (« — X)l 


This value is also given a special name. 



STATISTICAL ■321'' 

(30) Definition: The probability P^, that a producer will have a lot 
rejected when his production process is under control is called the pro- 
ducePs risk. 

The producer’s risk is sometimes defined with respect to a single lot 
of size iV, similar to the consumer’s risk, in which case the binoipial term 
being summed in (29) would be replaiced by (26). The definition given by 
(30) is more realistic from the producer’s point of view because he is inter- 
ested in the long-run percentage of lots that will be rejected rather than in a 
particular lot of given quality. It should be observed that the corisumer’s 
risk does not depend on the process fraction defective />, whereas the 
producer’s risk does. 


12.4.1 Miniiiwm Single Sampling 

Thus far nothing has been said concerning the method of selecting 
values of n and c. The consumer’s requirements fix the values of p^ 
and Pc in (27). Since N is specified, (27) places a single restriction on n 
and c. Now from the producer’s point of view one desirable method of 
approach is to select that pair of values which minimizes the anibunt of 
inspection. Since a sample of size n is always inspected and the remainder 
of the lot is inspected with a probability given by (29), the mean number 
of pieces inspected per lot under the sampling scheme (25) will l?e given 

(31) I=n + (N-n)P^ 

In order to satisfy the consumer’s demands and also minimize the amount 
of inspection, it is necessary to find that pair of values of n and c which 
satisfies (27) and minimizes (31). These quantities are difficult to Manip- 
ulate; consequently the minimizing solution is obtained numerically for 
different values of the parameters involved. Extensive tables are available 
for the minimizing values of n mA c for various values of the parameters 
and for chosen equal to .10. 

As an illustration, consider a lot of 1000 pieces for which the fraction 
defective is /? = .01 and for which the consumer is willing to assume a 
risk of Pc = .10 of accepting a lot with a fraction defective of == .05. 
By allowing c to assume small integral values and working numerically by 
trial and error methods, it will be found that the minimum amount of 
inspection will occur if a sample of 130 is taken and if the maximum 
allowable number of defectives is 3. With these values ofn and cjj it will 
also be found that the mean number of pieces inspected will be 164 as long 



322 


INTRODUCTION TO MATHEMATICAL STATISTICS 


as production remains in control. These results are easily obtained by 
consulting the Dodge and Romig tables referred to at the end of this 
chapter. 


12.4.2 Average Outgoing Quality Limit 

A somewhat different approach to the problem of protecting the con- 
sumer from an inferior product is to attempt to guarantee him a certain 
quality level of the product after inspection, regardless of what quality 
level is being maintained by the producer. Toward this end, consider the 
problem of determining the mean value of the fraction defective after 
inspection if the producer’s fraction defective is p. 

First, it is necessary to derive a formula for calculating the expected 
value of a variable y in terms of conditional expected values of y when the 
population is split into two groups by means of a related variable x. For 
example, suppose the mean grade-point average of students in a given 
college is desired. It might be interesting to obtain the mean grade-point 
averages of students whose intelligence-quotient scores are less than 
110 and of those whose scores are at least 110 and then combine the two 
means properly to give the desired mean. 

Suppose that each member of a finite population is measured with 
respect to two variables x and y. Let the population be split into two 
parts by the criterion that x < c ox x^ c and let y^, y 29 ' * ' denote 
the possible values of y. The mean value of y is given by 

(32) = 2 y^P{y^) 

i = l 

Now, the probability P{y^ that y will assume the value y^ can be obtained 
by considering the two mutually exclusive ways in which this event can 
occur. Either the individual selected at random will belong to the first 
group {x < c) and possess this value or he will belong to the second 
group {x > c) and possess this value. The sum of the probabilities of 
these two possibilities will yield P{y^\ hence 

= P{^ < c]P[yi \ x c] + P{x> c}P{yi I a: > c} 

If this formula is substituted in (32), it will follow that 

(33) Ely] = P{x < c} 2 ViPiyi | ^ c} 

i = l 

k 

+ p{x > c} 2 yiP{yi I * > c} 



STATISTICAL DMGR IW lEXMlSIfiim ' 


But each of the two sums on the right, when compared with (32|, is seen 
to be a conditional expected value of y. As a result, (33) may hi written 
in the form ii ; ^ 

(34) E[y] — P{x ^ c)E[y | a: ^ c] + /’{a; > c)E[y | a; > c] ;; 

In order to use this formula to determine the mean value of the' fraction 
defective after inspection, let y be the number of defectives left')n a lot 
when the procedure given in (25) is followed. When x > c, tie entire 
lot win be inspected and all defective pieces will be replaced by npndefec- 
tive pieces; hence the value of y will be 0 and therefore the yalue of 
1 * > c] will be 0. When a; ^ c, there will be iV — n uninspected 
pieces; hence the value oty can range from 0 to W — n. But, siiice these 
N — n pieces constitute a random sample of this size from a binomial 
population with probability p, the mean number of defectives ; will be 
\N — n)p; hence this is the value of E\y j a; c]. If these values are 
inserted in (34), it will reduce to 

(35) E\y] = P{x ■^c]{N — n)p 

In order to obtain the mean fraction defective after inspection, rather 
than the mean number of defectives as given by (35), it is merely necessary 
to divide both sides of (35) by N. If p denotes the mean fraction defective 
after inspection, it therefore follows that . 

(36) p = E^^'^=P{x^c}{l-^'jp r 

Since 0 ? is the binomial variable discussed in connection with (29), < c} 

may be expressed as the sum of the corresponding binomial probabilities; 
hence (36) can be written [ 


nJ 05=0 x\ (n x)\ 


This formula gives the mean fraction defective after inspection when 
following the inspection procedure (25); however, it is more corpmonly 
called the average outgoing quality, | 

When the sampling procedure given by (25) has been specified, the 
values of Ny n, and c may be treated as given. Although (37) was calculated 
on the assumption that the producer’s fraction defective is p and is |nown, 
the consumer is not likely to accept the producer’s claim that p is the actual 
process fraction defective; hence p may not be treated as giveh. If p 
is considered as a function of only, it will be found that ^ ordinarily 
possesses a maximuin value that is assumed for a single value of p. This 
maximum value of p, denoted by pr,y is given the following special name. 



324 


INTRODUCTION TO MATHEMATICAL STATISTICS 


(38) Definition: The maximum value, pj^, of the mean fraction defective 
after inspection, as a function of p, is called the average outgoing quality 
limit. 

The average outgoing quality limit is a number such that, regardless 
of what the producer’s fraction defective may be, the mean fraction 
after inspection will not exceed it. This does not prevent a particular lot 
from containing worse quality, but in the long run the average value of 
the fraction defective in the inspected lots will not exceed pj^. Since 
these calculations are based on the assumption that the binomial distri- 
bution may be applied to defective parts coming off the production line, 
it is necessary that the production process be under control at some 
quality level p, even though the particular value of p being maintained is 
irrelevant. 

The average outgoing quality limit has a certain appeal to many con- 
sumers that is not possessed by the protection afforded by a specified 
consumer’s risk. For this reason, it is widely used as a basis for consumer 
protection. ^ . 

It is usually possible to select several pairs of values of c and n that will 
yield functions, p, having approximately the same value of pj^. Figure 1 
illustrates a typical situation. Since it is immaterial to the consumer 
which pair of values of c and n is chosen for a specified value of pj^, the 
producer is at liberty to choose them to his advantage. From his point 
of view it would be highly desirable to select that pair of values which mini- 
mizes the amount of inspection given by (31). As in the minimum single 
sampling scheme of the preceding section, the minimizing pair of values 
of c and n is obtained numerically. Tables are also available for deter- 
mining these minimizing values corresponding to useful ranges of values 
of A, pj^ and p. It should be noted that the value of the process fraction 
defective p is required in order to minimize /, just as it was in the case of 
minimum single sampling. 



Fig. 1. Sampling plans with approximately equal values. 



STATISTrCAL';Dm<G?N^ 325 ' 

As an illustration of the preceding ideas, consider the problem that was 
used as an illustration for minimum single sampling. There I060 
and p = .01. If the consumer requests an average outgoing quality limit 
of, S2iy,pL = .03, the Dodge and Romig tables referred to previously will 
give c = 2 and = 44 as the values that wilf minimize the an^^^unt of 
inspection. 


REFERENCES 


An extensive discussion of fundamental principles such as randomization and replica- 
tion may be found in R. A. Fisher, The Design of Experiments^ Oliver and Boyd. 

A mathematical proof that the F distribution may be applied to the analysis o| variance 
problems as indicated in the text may be found in D. S, Fraser, Statistics: An Jntroduc-- 
tion, John Wiley and Sons. 

A discussion of the difficulties that arise in the two analysis of variance models when 
there are three or more variables may be found in A. M. Mood, Introduction to the Theory 
of Statistics, McGraw-Hill Book Co. is 

Tables for assisting in the design of efficient sampling inspection schemed may be 
found in Dodge and Romig, Sampling Inspection Tables f John WiXey and Sdn;^. 


EXERCISES 

1 . Given the following analysis of variance breakdown and the corresponding 

numerical values of the sums of squares, test the hypothesis that the colurnn 
means are equal. :: 

i=lj=l 

400 160 240 i; 

2. Suppose that the last sum of squares in problem 1 had been further analyzed 
to measure the row variability and had yielded the indicated numerical values. 
Now test the hypothesis that the column means are equal. 

2 2 =22 ~ +22 * j 

i=lj=l ■. 

240 100 140 :! 

3. Assuming that the row means of problem 1 are equal, use the last-sum of 
squares in that problem to construct an unbiased estimate of a®. 

4. Use the last sum of squares in problem 2 to find 98 per cent confidence 

limits for / V i; 

5. The following table gives the gains of 4 different types of hogs fed 3 different 



326 


INTRODUCTION TO MATHEMATICAL STATISTICS 


rations. Test to see whether the rations or the hog types differ in their effect on 
mean weight. 

Type 



I 

II 

III 

IV 

A 

7 

16 

10.5 

13.5 

B 

14 

15.5 

15 

21 

C 

8.5 

16.5 

9.5 

13.5 


6, The following data represent the number of units of production per day 
turned out by 5 different workmen using 4 different types of machines, (a) Test 
to see whether the mean productivity is the same for the 4 different machine 
types, {b) Test to see whether the 5 men differ with respect to mean productivity. 


Machine Type 



1 

2 

3 

4 

1 

44 

38 

47 

36 

2 

46 

40 

52 

43 

3 

34 

36 

44 

32 

4 

43 

38 

46 

33 

5 

38 

42 

49 

39 


7. Suppose that an analysis of variance experiment involving 10 rows and 4 

columns gave a significant result when testing the hypothesis that the column 
means are equal and that 2 of the sample column means appeared to be larger 
than the rest. The error sum of squares occurring in the denominator of the F 
test, based on 36 degrees of freedom, was equal to 180. If you wished to test 
the- new hypothesis //q : against > /^g for the 2 columns of interest, 

approximately how large a sample of equal size should you expect to take from 
each if you wanted to be certain with a probability of about .90 of detecting a 
difference of — /Mg = ^ 

8. For the components of variance model, derive an expression for the ex- 
pected value of the second sum of squares on the right side of the identity 

2 2 = 2 2 j - *)* + 2 2 ~ 

i=lj=l i=l}=l i=lj=l 



STATISTICAL DmGH lfJ 'EXmiMENK 32*7 

; . ■ ■ ' ' "■ • H ■ 

First, express it in terms of the variables and (i = 1, •••, a; y = 1^' • • • , 6), 
where it is assumed that = Vj- + and then calculate the necessary expected 
values, 

9. For the data of problem 1 use the formula obtained in problem 8 a 
Table 4 to obtain an unbiased estimate of and of 

10. Describe an analysis of variance experiment for which it is clear that the 

linear hypothesis model is the natural model for testing row or column variability . 
Describe an experiment for which the components of variance mod^l is the 
natural model to use. li 

11. Using the Poisson approximation 1; 

%npr 

x=:^o 

with p equal to .05 and .01, respectively, to obtain the values of and given 
by (27) and (29), respectively, verify that the values of c and « given in tke illus- 
tration on minimum single sampling are approximately correct to yield = .10 
and / = 164. 

12. Using the Poisson approximation of problem II, determine by ntimerical 
methods the values of c and n that minimize the amount of inspection for N = 
400,Pj. = .10,/?^ = .05, and/7 = .02. Proceed by assigning c a value, beginning 
with 2, then by determining the value of n to satisfy (27), and finally By sfele^^^ 
that pair of values which makes (31) a minimum. 

13. Explain why you should prefer to calculate the producer’s ri|k from 
formula (29) than from the corresponding formula based on the hypergebmetric 
distribution used in (26). 

14. Consult the Dodge and Romig tables to verify that your results in ]|roblem 
12 are approximately correct. Use these same tables to determine the Average 
outgoing quality limit for this problem. 

15. Show that P{x | p) ^ P{x | where Np' = A//? + 1, if a? < rtij^p + 1)/ 

c c 

{N H- 1). This shows that ^ 1 p} ^ 2 I provided that c < «/?, and 

c x=0 a;==0 C 

that 2 \ p} ^ decreasing function of p as assumed in the discussion follow- 

x=Q . 

ing formula (26) in the text. 

16. A sample of size n is to be taken from a population made up bf the A 
strata consisting of Ni(i — 1, 2, • • * , k) members. If denotes the sjze/.sample 
to be taken from the /th stratum and the cost per sample from this stratum, 

Ic 

and if a total of c =2 dollars can be spent for the sample, show that the 

variance of the estimate of the population mean will be a minimum if is chosen 
proportional to NiajV where is the variance for the zth stratum population. 

17. A population is made up of k subpopulations. The probabilities of success 
for an experiment for these subpopulatibns are p^yp^, * • * .pu, respectively. 
A set of n experiments is conducted by first drawing one of the k subpopulations 
at random and then performing the n experiments with it. If x denotes the number 



328 


INTRODUCTION TO MATHEMATICAL STATISTICS 


of successes obtained, show that the variance of x is given by V{x) = nfjLjJ — 
/* 3 >) + “I) yip), where is the mean of the / 7 ’s and Vip) is their vari- 

ance. Explain what this means with respect to sampling from a population made 
up of highly different subpopulations. 

18. If infected plants tend to occur in groups that are randomly distributed 
over an area and if p is the proportion of sampling areas that contain at least 1 
group of infected plants, show that an estimate of the plant density of infected 
plants is --/-tlog(l — /?), where is the mean number of infected plants per 
group, 

19. Suppose a population consists of 2 subpopulations with means 20 and 30, 
respectively, and a common variance of 5. If the two subpopulations are equally 
probable in random sampling, calculate the advantage in estimating the mean in 
taking 2 samples of 50 each from the 2 subpopulations over taking a random 
sample of 100 from the entire population. 

20. Let p denote a cost factor that represents what a single sample costs when 
it is necessary to sample from 1 of the subpopulations in problem 19 as contrasted 
to 1 unit of cost for a single random sample from the entire population. Thus, 
if/7 = 1.2, it follows that 120 random samples will cost the same as 100 samples 
from a subpopulation. Determine the largest value of p such that the 2 methods 
in problem 19 will cost the same and yield the same precision in estimating 

21 . Why is it reasonable to take the ratio of the variances rather than the ratio 
of the standard deviations of 2 estimates to compare the estimates ? 



CHAPTER 13 


Nonparametric Methods 


Most of the statistical methods that have been considered so &r have 
possessed two features in common. They have assumed that the functional 
form of the basic frequency function is known and have been concerned 
with testing hypotheses about parameters of this frequency function or 
with estimating its parameters. For example, all the small sample niethods 
developed in Chapter 11, with the exception of the material on the range, 
require that the basic variable be normally distributed and are concerned 
with testing or estimating means and variances of those variables. The 
distribution of Chapter 10, however, was not restricted to problems of 
this type and is a striking exception to the general pattern of methods found 
in other chapters. 

For situations in which very little is known about the distribution of 
the basic variable or for which it is known that the distribution i$ not of 
the required type, it is necessary to develop methods that do not Bepend 
on the particular form of the basic frequency function. A number of 
methods of this type have been designed. The only assumption;That is 
needed for most of these methods is that the frequency function be con- 
tinuous, A few pf them, however, require that the frequency function 
possess low order moments. 

Since the methods being described are not concerned with testing or 
estimating the parameters of a frequency function of a given type, they 
are usually called nonparametric methods. Such methods are also called 
distribution-free methods because they do not require a knowledge of how 
the basic variables are distributed. Since neither name is strictly correct 
for all of the methods usually listed under these names, the first name is 
used here because of tradition. Although a large number of nOnpara- 
metric techniques are available to solve certain types of problems, only a 
few of the more important ones that are fairly easy to discuss are con- 
sidered in this chapter. All of the techniques to be discussed are con- 
cerned with testing hypotheses, except for one, which was designed for 
estimating a distribution function. 

^^0 



330 


INTRODUCTION TO MATHEMATICAL STATISTICS 


13,1 Sign Test 

In this and the next section the problem to be considered is that of 
testing whether two unknown frequency functions are identical. This 
problem arises, for example, when the same population is sampled on 
two different occasions and there is reason to believe that the population 
may be changing. If it were known that the two populations were normal, 
then one could use the methods developed in Chapter 1 1 for testing the 
equality of means and variances to solve the problem; however, since the 
population distribution is assumed to be unknown here, a nonparametric 
method is needed. The method that is about to be discussed was designed 
for experiments in which the same size sample is taken from each of the 
populations and in which the experimental values are paired. For example, 
in determining whether two types of coating for soil pipe are equally 
resistant to corrosion, the experimenter would ordinarily subject each 
type of coating to the same set of soil types and thus obtain a pair of experi- 
mental values for each soil type. 

Let fi{x) and /^{x) be the two continuous frequency functions under 
discussion and let {x^, {x^, y^, • • • , {x^, t/„) denote n paired sample 

values to be drawn from the two populations. Consider the hypothesis 

(1) i/o:/i(^) 

For the purpose of testing this hypothesis, it is convenient to consider 
the differences x^ — y^, (/ = 1, 2, • • • , «). When is true, x^ and y^ 
constitute a random sample of size two from the same population. Since 
the probability that the first of two sample values will exceed the second 
is the same as the probability that the second will exceed the first and 
since, theoretically, the probability of a tie is zero, it follows that the 
probability that x^ — y^ will be positive is Thus, if only the signs of the 
differences are considered, a nonparametric test for can be constructed. 
Toward this end, let 

fl, if > 0 

lo, if x^^ y^<0 

Then the variable is a binomial variable corresponding to a single 
trial of an experiment for which p Since the are independent, their 

n 

sum u = '^ z^ will be a binomial variable corresponding to n independent 

i = l 

trials of an experiment for which p = h 



NONPAR AMETRIC METHODS 331 

In order to use this last result for testing Hq, consider as an alternative 
to //q the hypothesis 

(2) f 

where c is some positive constant, states that the second frequency 
function is merely the first frequency function shifted to the left a distance 
of c units. Figure 1 illustrates the relationship between /i(a:) an(|y^(a;). 

Under the will tend to be larger than the and the variable u 
will tend to exceed its expected value of njl. One would therefor| choose 
as critical region the right tail of the binomial distribution. If c had been 
negative, the left tail would have been chosen; however, if Were the 
alternative that a translation of unknown direction had occurred, both 
tails would be used. j! 

The test that has just been described is known as the sign test. It is an 
extremely simple test to apply. A useful feature of the test is tfiat it is 
applicable to situations in which the frequency functions /i(a:) ahd/ 2 (a;), 
although identical under Hq for each pair of samples, change from sample 
pair to sample pair. For example, in the illustration of soil-pipe qoatings 
it might happen that for one type of soil there is very little corrosion for 
either coating, whereas for another type there is a great deal of corrosion 
for both coatings. One would expect the variation in the amount of cor- 
rosion for the first soil type to be considerably smaller than the vanatibn 
for the second. Thus not only the mean but also the standard deviation 
would differ for the two soil typ^^ 

As an illustration of how the sign test is applied, consider t!he data 
found in Table I. 

These data were obtained by taking random samples of 30 from two 
normal populations with means 14 and 16 and standard deviations 2 and 
2, respectively. The reason for choosing normal data is that it is inieresting 
to compare this and other nonparametric methods with standard iiiethods 
based on normality. j 

If the differences are taken, it will be found that there! are 10 

positive and 20 negative differences ; hence w =10. Since the "normal 



Fig. 1. A frequency function and its translation. 



332 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Table 1 


X 

13.3 

14.6 

13.6 

17.2 

14.1 

10.6 

15.9 

14.7 

14.2 

14 

y 

14.1 

15.1 

9.9 

14.5 

17.9 

16.1 

16.8 

15.1 

13.2 

18 


X 

17.4 

15.6 

8.2 

13.8 

15.4 

16.3 

17.7 

15 

13.4 

13.4 

y 

16.3 

13.3 

15.8 

18 

20.4 

15.7 

21.5 

14.5 

16.7 

13.7 


X 

16 

13.3 

14.9 

12.9 

14 

16.2 

11.5 

10.4 

12.6 

18.1 

y 

13.6 

17 

15.7 

16.8 

18.8 

18.8 

16 

14.6 

12.3 

17.7 


approximation to a binomial distribution with p = ^ is excellent for n 
as large as 30, u may be treated as a normal variable here. 

Because it is known that the alternative given by (2) holds here with 
c = — 2, let the hypothesis to be tested be that given by (1) and let the 
alternative be that given by (2), where c < 0. The left tail of the u distri- 
bution will therefore be chosen as the critical region for the test. In order 
to test the hypothesis, it suffices to calculate the probability that u < 10.5. 
Thus 


( 3 ) 


^ 10.5 - 15 


- 1.643 


hence P{u < 10.5} .050. Since this probability is the borderline value 

for making a decision, one might toss a coin or compute P more accu- 
rately; however, the hypothesis is known to be false here. 

If this problem is worked on the assumption that the differences of 
the paired values can be treated as random sample values of a normal 
variable and if Student’s t is calculated for the differences, it will be 
found that 

t = -3.16 


For 29 degrees of freedom, the .005 point for the t distribution is 2.76; 
hence the probability of obtaining a value less than —3.16 is much smaller 
than .005. A comparison of this result with that above using the sign 
test shows that the / test was able to demonstrate the real difference 
existing in the two populations with greater assurance than the sign test. 
This, of course, is to be expected when the normality assumption holds 
because the t test is known to be an optimum test for this type of problem. 



NONPARAMETRIC METHOBB ^ 333 

The sign test can also be used to test the hypothesis that the median 
of a distribution has a given value. Since the median point on a distri- 
bution is one such that the probability is J that a sample value will exceed 
it, the median is essentially a nonparametric property of a distribution. 
Testing the hypothesis that the median has a given value is a honpara- 
metric analogue of testing whether the mean has a given valued Instead 
of observing the signs of the differences as in the preceding prob- 
lem, one works with the signs of the quantities where ^ is the 

postulated median value. The test procedure is then the same as before- 
Although the sign test may appear to be rather weak, it can fte shown 
that it possesses certain optimum properties for testing a hypothetical 
median when nothing is known about the underlying distribution. It is 
only when one compares the sign test with a test based on a given distri- 
bution, such as the normal distribution, that the sign test suffers! 

13,2 Rank Sum Test 

... ti • • ■■ ■ 

If the data for testing the hypothesis =/2(x) do not consist of 
matched pairs, the sign test is not the natural test to apply. This is partic- 
ularly true if the sizes of the sarnples from the two populatiohs differ 
considerably because then the sign test will waste some of the data. A 
simple nonparametric test for this more general situation can be obtained 
by studying the possible ordered arrangements of the combined sample 
values. . i| - 

Let x-^, — , x^^ and y^^ * * • , yn^ denote random samples of 

sizes and Wg taken from the populations with the continuous frequency 
functions f^{x) and^(ir), respectively. Let these two sets of sample values 
be arranged in order of increasing magnitude and denote the 'ordered 
sets by x^, x ^, ' * * » ^^d y^, 2/2? * ' ’ » If the two ordered sets are 

combined into a single ordered set, a typical arrangement such as 

( 4 ) 2/1, 2/2, ^ 1 . ^2, 2/4. ^5. % • ‘ • 

will be obtained. When /^{x) the x- and yj represent random 

samples from the same population. The combined set x(^ -I* • , 

‘ > Vn^ therefore represents a random sample of size + «2 from 
this common population. Since the sampling is random, any particular 
order of these sample values should have the same probability of occurring 
as any other order. For example, the first sample to be drawn, has 
the same probability of being, say, the largest value obtainecf as the 
second sample ^^2'. Thus each of the + n^\ possible permutations of 
the variables x^, • • • , x^^\ Vx,* ' ' , Vn^ has the same probability of being 



334 


INTRODUCTION TO MATHEMATICAL STATISTICS 


the ordered set of values. In calculating the probability that a particular 
type of ordered set will be obtained, it is therefore necessary to count 
the number of the ! permutations that give rise to the desired 

order type and divide this number by {n^ + ! 

Although it makes no difference which variable is labeled a: and which 
y, it is convenient when consulting the tables that have been constructed 
for the test to be presented here to have and therefore to label 

the variables accordingly. 

After this joint ordering, as in (4), has been performed, write down the 
ranks of the x values and let T denote the sum of those ranks. For ex- 
ample, in (4) one would write down beneath the consecutive x values 
displayed there the numbers 3, 5, 8, • • • because those are the ranks of 
the X values in the combined set. The value of T would then be the sum of 
those numbers. 

Now the sampling distribution of T under the assumption that f^{x) = 
and therefore based on the resulting assumption that all {n^ + ^ 2 )’ 
permutations of the combined set of values have the same probability of 
being the ordered set, has been worked out by combinatorial methods. 
The distribution of T depends, of course, on the sizes, n-^ and of the 
two samples. Table VII in Appendix 2 gives the necessary critical values 
corresponding to various sample sizes for «2 ^10* larger sample 
sizes the distribution of T can be approximated satisfactorily by the proper 
normal distribution. This is the normal distribution with mean and 
variance given by the formulas 

^ «i(ni + na + 1) 

„ 2 _ »i»2("i + n 2 + 1) 

O rp — 

12 

The problem used to illustrate the sign test is also used to illustrate 
the rank sum test. If the 30 values of x and the 30 values of y from Table 
1 are ordered in magnitude, the following combined ordering will be 
obtained. The y values are the italicized values. The tied values of x 
and y were alternated in the ordering for all tied pairs. 

8.2, 9.9, 10.4, 10.6, 11.5, 12.3, 12.6, 12.9, 13.2, 13.3, 

13.3, 13.3, 13.4, 13.4, 13.6, 13.6, 13.7, 13.8, 14.0, 14.0, 

, 14.1, i^.;, 14.2, 14.5, 14.5, 14.6, 14.6, 14.7, 14.9, 15.0, 

15.1, 15.1, 15.4, 15.6, 15.7, 15.7, 15.8, 15.9, 16.0, 16.0, 

. 16.1, 16.2, 16.3, 16.3, 16.7, 16.8, 16.8, 17.0, 17.2, 17.4, 

/7.Z, 17.7. 17.9, 18.0, 18.0, 18.1. 18.8, 18.8, 20.4, 21.5, 



NONPARANfETRIC METH00i§ |i 335 

In order to calculate the value of T, it is necessary to write cjown the 
ranks of the x values in (6). 

1, 3, 4, 5, 7, 8, 10, 12, 13, 14, 16, 18, 19, 20, 21,^' 23, 

27, 28, 29, 30, 33, 34, 38, 40, 42, 43, 49, 50. 52, 56 " ‘ ' 

The sum of these ranks is T = 745. Since = 30, application of 

formulas (5) will show that 

E{T) = 915, = 4575 

Consequently i 

_ _ T~ EiT) _1A5 - 915 ^ 

67.6 

Since T is an standard normal variable, it follows from 

Table II that P{r < —2,51}= .006. A comparison of this result with 
that of the sign test shows that the rank sum test is superior to the sign 
test for this problem. The value of t = —3.16 for the Studen|’s / test 
shows, however, that the t test is still somewhat better than either of the 
nOnparametric tests for this problem, as was to be expected. 

When testing the hypothesis against the alternative hy- 
pothesis that the first population is situated to the left of the secorid popu- 
lation, the left tail of the T distribution should obviously be chosen as the 
critical region. One should, of course, use the right tail if the shift has 
been to the right. The rank sum test is known to be an excellent test for 
the type of problem being considered here. It can be shown, for example, 
that even when the variables are normal, the rank sum test base| on lOO 
observations is approximately as good as Student’s t test basecl on 95 
observations. 


13,3 Runs 

Most of the statistical methods that have been considered in |he pre- 
ceding chapters were designed to be applied to data for which no useful 
information was gained by preserving the time order of the observations. 
It was assumed that the observations constitute a random sample from a 
fixed population, in which case the time order can be ignored. If there is 
reason to believe that the observations may not behave like a random set 
when they are taken over some time interval, then it is necessary to test 
the randomness of the sequence before the usual statistical methods based 
on randomness can be applied. 

A second reason for studying methods that attempt to detect a; lack of 
randomness in sequences of observations is that such methods may be 



336 


INTRODUCTION TO MATHEMATICAL STATISTICS 


superior to the usual static methods for testing certain hypotheses. The 
advantage of preserving the time order of observations in testing hypoth- 
eses is demonstrated in Chapter 14 in the material on sequential analysis. 

Although a few of the statistical methods in the preceding chapters 
have been based on the order of observations, they assume that the dis- 
tribution of the basic variable is known. For example, the quality-control 
chart technique was applied to such variables as normal and binomial 
variables. In this section a nonparametric method is discussed for testing 
the randomness of sequences of observations. 

Suppose samples are taken every morning and afternoon from a 
production line to check on a quality characteristic of the product, say, 
the diameter of a part, and suppose that the following diameters are 
obtained: .220, .213, .221, .222, .219, .214, .222, .216, .212, .221, .223, 
.214, ,221, ,216, .217, .215. If each value is assigned the letter a, pro- 
vided that it is above the median, which will be found to be .218, and 
the letter b, provided that it is below .218, this set of values will yield the 
following sequence of letters : 

(7) a, b, a, a, a, b, a, b, b, a, a, b, a, b, b, b 

Now, a sequence of i identical letters that is preceded and followed by a 
different letter or no letter is called a run of length i. The runs of <3c’s 
and b's in this example have the lengths 1, 1, 3, 1, 1, 2, 2, 1, 1, 3. 

By studying how runs behave for random sequences, it is possible to 
derive tests for randomness that are based on runs. One of the simplest 
statistics to study in this manner is the total number of runs in the se- 
quence. In the preceding illustration the total number of runs is 10. 
Now, intuitively, one might feel that there are too many runs in a sequence 
of this length, compared to the number of runs expected in a randomly 
selected sequence of the same length. An excessive number of runs might 
occur, for example, if there were a tendency for machines to turn out parts 
slightly too large in the morning and slightly too small in the afternoon 
because then there would be a tendency toward the sequence Z>, a, b, a, 
' • • . Too few runs might occur if there were a tendency for the machines 
gradually to produce larger parts from day to day because then the early 
diameters would be mostly below the median, whereas the later diameters 
would be mostly above the median. Several very long runs will, of course, 
reduce the total number of runs possible for a sequence of given 
length. 

In order to obtain the frequency function of a statistic such as the 
total number of runs, it is necessary to obtain the joint frequency function 
of the number of runs of a’s and the number of runs of 6’s. Toward this 
end, consider a sequence of n letters consisting of % o’s and i’s. Let 



NONPARAMETRIC METHODS 337 

and /-j denote the number of runs of a*s arid i’s, respectiveljy. Now 
consider the basic problem of finding the probability that the number of 
runs of a’s and of 6’s will have specified values' - 

Let x-^, x^, •••,»„' denote the random variables whose values are to 
be converted to a’s or b% depending on whether the value is above or 
below the median of the set. These variables are assumed to represent a 
random sample of size « from a population with a continuous frequency 
function. For ease of explanation, assume for the time being that n is an 
even number, say n == Ik, so that there will be the same number bf values 
(A:) above and below the median. Now as explained in 13.2, each of the 
n! possible permutations of the variables x^, x^, - • • , x^ has the same 
probability of being the ordered set of values denoted by x^, xj - • • ,x„. 
From this property, it will follow that every distinct permutatipp of the 
a’s and b’s will have the same probability of occurring. This fact can be 
seen in the foUowing manner. 

Let b a b b a a - • - a denote any permutation of the k a’s and jihe lb Fs 
and consider the number of the n! permutations of the a:'’s |hat will 
yield this particular permutation of a’s and ft’s. The first ft in this per- 
mutation means that the first sample value x-^ was smaller than the 
median of the set of values; hence x-^ could have occupied any ope of the 
first k order positions in the ordered set. The first a in this permutation 
means that the second sample value x^ was larger than the median; 
hence x^ could have occupied any one of the last A: order positions in the 
ordered set. Thus there are A: choices of order positions for the first ft 
and also for the first a. In a similar manner, there are k — 1 retnaining 
choices of order positions for the second ft and the second a. Filling 
order positions in this manner, there arelb ! A: ! choices of order positions 
for the a:'’s to yield the desired arrangement of a’s and ft’s. Since this 
number of choices does not depend on the particular permutations of 
a’s and ft’s and since all order permutations of the x'’s have the same 
probability of occurring, it follows that all distinct permutations of the 
a’s and ft’s have the same probability of occurrihgb The same ar|uments 
apply for the case in which n is an odd number with n — Ik + L' 

In view of the preceding discussion, the probability that /•„ anji will 
have specified values is given by the ratio of the number of permutations 
of the a’s and ft’s possible when n„, «j, r^, and are held fixed to the 
number of permutations possible when only and are hep fixed. 
This assumes that only samples of size n which give rise to a’s arid ft’s 
are being considered and that the two random variables here are 
and r^. 

The denominator of this probability ratio, which is the nuriiber of 
permutations when and are held fixed, is equal to the number of 



338 


INTRODUCTION TO MATHEMATICAL STATISTICS 


ways of permuting n things of which are alike and are alike. From 
(18), Chapter 2, the denominator is therefore 

( 8 ) 


For the purpose of counting the number of permutations when and 
and also and are held fixed, concentrate first on the a’s. As far as 
runs of a’s are concerned, the b's in a sequence such as (7) merely serve to 
separate the ^’s into blocks. In order to simplify matters, these separa- 
tions are made by means of vertical bars rather than by means of ^’s. 
Thus (7) would be designated 

a\aaa\a\aa\ a 

The bar at the end was omitted because it does not separate a's and thus 
does not affect runs of a’s. Since and are being held fixed, the number 
of <3’s and the number of blocks are fixed. Under these restrictions, 
different permutations can be obtained only by moving c’s from one 
block to another without destroying any blocks. The number of such 
possible permutations can be counted in the following manner. 

If the a"s are arranged in a line, they can be separated into blocks 
by placing — I vertical bars in distinct spaces between the a"s. Since 
there are I spaces between the a’s and -■ 1 of them are to be 
chosen, the number of distinct permutations possible is the number of 
ways of choosing — 1 things from — 1 things, which by (17), Chapter 
2, is given by 

K - 1)! 

(fa - l)!(«a - fo)! 

- same arguments apply to the b's; hence the number of permuta- 
tions of the i’s subject to and being held fixed is 



( 10 ) 


/«!, “ q ("o ~ 

(ffr - 1)! (/ift - A-J,)! 


In order to combine the a’s and ^’s, suppose that the sequence begins 
with an a. Then, for each permutation of the a's, the b's can be permuted 
in all possible ways in their blocks to give distinct permutations of the a’s 
and b"s jointly; consequently the total number of permutations of the 
a's and b's together, starting with an a and subject to and being fixed, 
is the product of (9) and (10), namely 




NONPARAMETRIC METHODS 


339 


This same result applies if the sequence begins with the letter Since 
blocks of a’s and ft’s alternate, either == r^^ orr^ = ± 1. If^a =f ^6 + 

the sequence must begin and end with an a. If — 1, the sequence 

must begin and end with a ft. However, if the sequence c!hn begin 

with either letter. For the first two cases there is no choice of beginning 
letter; hence (11) gives the desired number of permutations. -For the 
third case the number of permutations is twice as large because:;one can 
fit the a and ft blocks together by starting either with the letter'^ or the 
letter ft and any permutation beginning with the letter a must be jdifferent 
from one beginning the letter with ft. In every case the number oj desired 
permutations is given by 


( 12 ) 




where c 


(2 if 

u if 


The ratio of (12) and (B) gives the desired probability. The result that 
has just been demonstrated may be summarized in the following theorem. 


Theorem : //' and denote the respective number of runs above and 
below the median for a random sample of size n for a continuous: variaSle 
rr, and if n^, and denote the respective number of values of x above and 
below the median, then the pint distribution of r^and r^ is given by 


fir a, r^y ^ 


c{n^ - 1)1 (nf- i)l nj n^l 
(r„ - 1)! (r^ - 1)! (n^ - rjl{n^ - r^)l n\ 



where c = 2 if r^ = r^ and c ^ \ if r^ j. 

It will be observed that this theorem is not concerned with the‘‘form of 
the frequency function oix\ consequently, any test derived directly from 
this theorem will be a nonparametric test. 

Now consider the problem discussed at the beginning of this “ section, 
namely the problem of obtaining the frequency function for the total 
number of runs u when n^ and n^ are held fixed. Since w == r^ -F r^, the 
probability that u will assume a fixed value is obtained by suniiSbing the 
probabilities /(r^, r^,), given by the theorem for all values of r^ and r^ 
whose sum is this fixed value. If w is even, r^ r^ === ujl; consequently 
there is but one pair of values to be considered. If w is odd, = (u ± l)/2 
and = (w =F l)/2; consequently there are but two pairs of values to be 
considered. If /(w) denotes the desired frequency function, it therefore 
follows from the theorem that 



if u is even 



340 

and 


INTRODUCTION TO MATHEMATICAL STATISTICS 


/(«) = 



w — 1 
2 

L if M is odd 


u' 

These probabilities have been used to construct tables of 2 / (w) for 

m=2 

various values of and u\ Such tables enable one to test whether a 
sample value of u is unusually large or small compared to what would be 
expected if the sequence of values constituted a random sequence. In 
order to illustrate the use of such tables, a few entries have been extracted 
from one of them and have been recorded in Table 2. In this table u 05 
and W95 are the largest and smallest integers, respectively, such that 
P{u < w 05} ^ ^ ^.95} < *^5. These values may therefore be 

used as 5 per cent critical values for testing randomness against the alter- 
native of too few, or too many, runs. Because of the manner in which 
w 05 and were chosen, a sample value of u equal to either of these critical 
values lies in the critical region, and therefore would lead to the rejection 
of the hypothesis of randomness. Table 2 requires that however, 

since the median of a set of observations is being chosen for assigning 
letters, this requirement will be satisfied, or very nearly so. 

As a numerical illustration, consider the data introduced at the begin- 
ning of this section. There w = 10, = 8, and n^= S, Suppose that there 

is reason to believe that the diameters of parts may vary from morning 
to afternoon so that there may be too many runs. In order to test for 
randomness against this possibility, the right tail of the distribution is 
chosen as the critical region. Interpolating in the row of Table 2, 


Table 2 


"a = «!> 

5 

10 

15 

20 

25 

30 

40 

50 

60 

70 

80 j 

90 

100 

W.05 

3 

6 

11 

15 

19 

24 

33 

42 

51 



79 

88 

^.95 

8 

15 


26 

32 

37 

48 

59 


81 

91 


113 

^026 

2 

6 


14 

18 

22 

31 


49 

58 

68 

77 

86 

^.975 



9 

15 

21 

27 

33 

39 

1 


61 

72 

. 

83 

93 

.. 

104 1 

115 













NONPARAMETRIC METHODS 


341 


it will be found that the critical value of u is approximately 12,|, Thus, 
w = 10 is not large enough to refute randomness here. A considerably 
larger sequence might reveal a lack of randomness of the type conlectured, 
if such a lack exists. 

As a second illustration, consider the data of problem 57, Cliapter 5. 
There are reasons for believing that the expected percentage may be 
shifting here from time to time; hence consider testing the hypothesis 
of randomness against the possibility of too many long runs. Tf these 
percentages are assigned letters on the basis of lying above or below the 
median 2.5, with values equal to the median ignored, it will be found that 
the resulting sequence of a’s and A’s is !: 

b b b b a b b b a b a a a a a a a a 
a ababbbbbbbbaabaaa 

Here = 18, = 18, and m = 12. Interpolation in Table 2 gives 

M 05 = 13.4. Since u < Wo 5 > Ihls result is significant at the 5 per cent level; 
therefore, the hypothesis of randomness is rejected. There appear to be 
too few runs because of too many very long runs; consequently an 
investigation of the long runs should be made to determine the ’cause of 
nonrandomness. 

The preceding theory depends only on the assumption that all distinct 
permutations of the ^z’s and Z)’s are equally likely and does not require 
that the ^z’s and ft’s be obtained from measurements on some continuous 
variable. Thus the theory can be applied, for example, to such problems 
as determining whether a group of men and women seated along a limch 
counter is arranged in a random order. 

Several other tests for randomness are based on functions of runs. 
One such test, for example, is based on the probability of obtaining at 
least one run of a length greater than a specified length. Such a test might 
be helpful in the problem just considered because a run of length 10 for 
such a short sequence seems unlikely. 

The foregoing test based on the total number of runs is a poor test in 
many respects. It is effective only when the lack of randomness Shows up 
in producing too many or too few runs. There are many typesj' of non- 
randomness that produce the correct number of total runs associated 
with a random sequence. Tests based on counting the number of runs of 
various lengths are less likely to be deceived. 

13.4 Serial Correlation 

As explained in the preceding paragraph, the test for rarijdomness 
based on total runs possesses weaknesses. In particular it is not likely 



342 


INTRODUCTION TO MATHEMATICAL STATISTICS 


to discover certain types of nonrandomness of a cyclical nature unless 
the observations are spaced just right. There are many other types of 
nonrandomness that may occur but will be undetected by the runs test 
because the total number of runs is approximately equal to the number 
expected for a random sequence. 

For data that possess cyclical features it is to be expected that a test 
based on correlation would be more effective than the runs test in dis- 
covering such features. If observations have been ordered with respect 
to time and time is irrelevant, no correlation would be expected to exist, 
for example, between successive pairs of values of the sequence. How- 
ever, if there is a cyclical movement in the sequence, neighboring pairs of 
values will tend to be high or low together and thus produce a value of 
the correlation coefficient that differs significantly from zero. If the 
frequency function of a correlation coefficient of this type could be found, 
it would be possible to test the hypothesis that the population correlation 
is zero and in this sense test the sequence for randomness. The derivation 
of such frequency functions is complicated ; consequently only the results 
of one such derivation is described here. 

Let X 2 , ^ ‘ denote the sequence to be tested for randomness 
and consider the ordinary correlation coefficient calculated for this se- 
quence when is chosen as x^^^ for z = 1, 2, 1. With this 

choice for y, the corresponding values of x and y are those indicated in 
Table 3. 

The correlation coefficient of the values given in Table 3 is called the 
serial correlation coefficient with lag 1. If t/i had been chosen equal to 
x^^^, the correlation coefficient of the resulting x and y values would have 
been called the serial correlation coefficient with lag k. 

If the sequence x^, ^ 2 ^ * ‘ could be treated as a random sample of 
size n from a normal population, one would expect to be able to apply 
some normal distribution theory such as the regression theory of Chapter 
11 to solve the present problem. However, since the y^ no longer con- 
stitute a set of random sample values for a fixed set of a;’s, nor do the pairs 
of values {x^, yj) constitute a set of random sample values from a joint 
distribution, the ordinary regression and correlation theory is not appli- 
cable here. Furthermore, since this chapter is concerned with methods 
that do not require a normality or similar assumption, the methods of 
Chapter 11 would not be available for this reason also. 


Table 3 


X 

^1 

X2 


Xi 


^n -1 

y 

^2 

^3 


^i +1 









NONPARAMETRIC METHODS 


343 


A nonparametric method based on serial correlation can be devised if 
it is assumed that all permutations of the sequence being considered are 
equally probable. For each such permutation, one can calculate the value 
of the serial correlation coefficient. Since there^^^^^^^^^ pernliutations 
possible, there are n\ values of the serial correlation coefficient to be 
computed. The ordered set of values obtained, together with- the fre- 
quencies of those values that are obtained more than once, give the dis- 
tribution of the serial correlation coefficient with respect to the Wt of n \ 
permutations of the sequence. For most sequences one would expect the 
distribution to be fairly symmetrical and to be centered near the origin. 
If the sequence being tested yielded a large positive or negative value of 
the serial correlation coefficient, its randomness would be questioned. 
In order to obtain a critical region for testing randomness, it would be 
necessary to find two values of the serial correlation coefficient,- one for 
each tail of the distribution, such that say 5 per cent of the n \ values of 
the serial correlation coefficient lay outside the interval determined by the 
two values. 

It is obvious from the preceding discussion that the computational 
difficulties of the proposed test become prohibitive for n at all large ; hence 
it is necessary to find an approximation for the distribution of the serial 
correlation coefficient when « is large. This problem is considered next. 

For ease of discussion, let 2 /w be defined to be Then Table 3 will 
contain « pairs of values rather than /z — 1 pairs. The resulting cor- 
relation coefficient is called the circular form 6r IIliq serial correlation 
coefficient. For the extended table the seriar correlation coefficient may 
be expressed in the form 

n 

J. -nxy 

ns^Sy I 

Since all n values of the sequence occur in both rows of the extended 
Table 3 and the statistics and Sy are independent of the order of 

the sample values, it follows that x, y, and Sy are unchanged under 
permutations of the sequence. Now the only quantity in r that is'affected 

by permutations of the sequence is the sum ^ therefore it suffices 

i = l 

to Study the distribution of this sum rather than the distribution of r 
itself. Furthermore, as n becomes large, any differences in the distribu- 
tion of r, or of this sum, for the standard definition and the circula’r defini- 
tion of the serial correlation coefficient disappear, and it is enpugh to 
consider the statistic n I 

-R = 2 

i = 1 



344 


INTRODUCTION TO MATHEMATICAL STATISTICS 


If it is assumed that the values of the sequence being tested constitute 
a random sample from a population that possesses low order moments, 
then it can be shown that the random variable R has an approximate 
normal distribution for large n. In order to test the hypothesis of zero 
serial correlation, one must know the mean and variance of R. The 
necessary values are given by the formulas 

E{R) = 

n — 1 

^2^ ~ ^ 4 . I ~ 45x^52 + 45*253 + ^2^ — 254 
n — 1 (n — l)(n — 2 ) 

Sjc = + 01:2^ + 

Unfortunately, the computations involved in evaluating the mean and 
variance are somewhat lengthy if n is at all large. Since the test is not 
affected by adding the same constant to each member of the sequence or 
multiplying each member by the same constant, these computations can 
be simplified considerably by replacing the observed values by reduced 
values. 

As an illustration of how the modified serial correlation test is applied, 
consider the first sequence used in 13.3 to illustrate the run test. If .218 
is subtracted from each value and the resulting values are multiplied by 
1000 , the following sequence will be obtained: 

2, -5, 3, 4, 1, ^4, 4, -2, - 6 , 3, 5, --4, 3, ~2, --1, -3 

Computations yield the values 

= -2, S 2 = 200, 5^3 = - 170, 54 = 3944 

If these values are substituted in the formulas for E(R) and it will be 
found that 

= -13.1 and = 48.7 
It will also be found that R = —67, hence that 

'^R 

If one wishes to test for randomness against possible variation from morn- 
ing to afternoon, as was done earlier, one would test for zero correlation 
against possible negative correlation because alternating high and low 
values of x will produce negative serial correlation. Thus in this problem, 
assuming that the normal approximation is satisfactory, one would choose 


and 


where 



NONPARAMETRIC METHtODS n 345 

the left tail of the approximating normal curve as the critical region. Since, 

for a normal variable, 

P{t :^ -~l.ll} = a3 I 

it follows that the hypothesis of randomness would not be rejected. 

It should be pointed out that the preceding theory was discusled from 
the point of view of the serial correlation with lag 1 ; however, it js appli- 
cable to other lags also. One calculates the corresponding value pf JR and 
performs the test as usual. j: 

The preceding two sections have been concerned with two liipnpara- 
metric tests for deciding whether a sequence in time is randopi. The 
problem of discovering a lack of randomness in time data and the nature 
of it is a very important and difficult problem in statistics. The two 
techniques presented here are two of the simplest available to describe and 
are intended only as a mild introduction to one feature of the analysis of 
time series. 

The tests in this and the preceding section were constructed od what is 
known as the randomization principle. In the ordinary test a statistic such 
as t or F is chosen and then its sampling distribution in repeated sampling 
is found in order to determine a critical region for the test. A te§t based 
on the randomization principle is constructed in much the same planner, 
except that in determining the critical region one considers the distribution 
of the statistic under all possible permutations of the observationed values 
that are compatible with the hypothesis. Thus one does not compare the 
sample value of a statistic with its possible values under repeated sampling 
experiments but rather with the possible values under all possible permuta- 
tions of the values that were actually observed. This principle' can be 
used to construct nonparametric versions of the standard pammetric 
statistics such as t ot F, The difficulty is to find the distribution of such 
statistics under randomization, even approximately, so that a critical 
region can be determined. 


13.5 KolmogoroY-Smirnov Statistic li 

- ■ . . .11 . ■ 

The preceding nonparametric methods have been concerned with testing 
hypotheses. In this section a technique for finding a confidence band for 
the distribution function of a continuous variable is presented. By 
modifying this technique slightly, it can also be used to test hypotheses of 
the type treated in 13.1 and 13.2. 

As before, let • • • ,x^ denote a random sample from a popula- 

tion with distribution function F{x) and let x-^, ^^ 2 , * • • , denbte the 



346 


INTRODUCTION TO MATHEMATICAL STATISTICS 


ordered sample. The problem now is to use this ordered sample to obtain 
a confidence band for F{x), It should be noted that it is the distribution 
function F{x) and not the frequency function f{x) that is being considered 
here. 

The desired method, which was first presented by the two Russian 
mathematicians whose names are attached to it, consists in using the 
ordered sample to construct an upper and lower step function such that 
F(x) will be contained between them with a specified probability. Toward 
this end, consider the sample distribution function, which is a step func- 
tion, given by the formula 


0 , 


S„(*) = 


k 

n 


1 , 


X x-^ 

^ %+l 


The graph of this function for a typical sample, together with the graph of 
a typical F{x)^ is shown in Fig. 2, 

Now suppose that F{x) is known. Then it would be possible to calculate 
the value of |F(a;) — for any desired value of x. Furthermore, 

it is clear from Fig. 2 that it would be possible to calculate the value of 
the quantity 

msix \F(x) - SJ^x)\ 

X 

which is the maximum vertical distance between the graphs of F(x) and 
over the range of possible x values. It can be shown that the distribu- 
tion of this maximum distance does not depend on F{x), As a consequence, 
this quantity, which is denoted by can be used as a nonparametric 
variable for constructing a confidence band for F(x), 



XI 



■ NONPAKAMETmC MBTIIOT 34T 

Since S^i^) varies from sample to sample, Z)„ is obviously 4 random 
variable. In order to use it as a tool for finding a confidence band for 
F(x), it is necessary to find its distribution. This distribution can be 
worked out numerically for any particular value of n by using combina- 
torial methods that are too lengthy and involved to be presented here. 
Certain critical values of this distribution, however, are given in Table VIII 
in Appendix 2. Let D/ denote such a critical value that satisfies the 
relation 

(13) 

In view of the definition of and (13), the following successive equali- 
ties can be written down: ;; 

l-x = P{max]F(x)-S„(x)]^D:‘} 

X ■■ ■ If-. ■.■■■; -• 

^ P{\F(x) - S,(x)\ < D/ for all 
^ P{SJx) - D/ < F(x) < S,(x) + i)/ for all a;}" 

This last equality shows that the two step functions, + lb/ and 
Sn(x) — D/, yield a confidence band with confidence coefficient 1 — a 
for the unknown distribution function F(x). 

To illustrate the preceding technique, the sample values for the variable 
X in Table 1 are employed to construct a 95 per cent confidence band for 
F(x). The ordered values of this sample are 8.2, 10.4, 10.6, ll'!5, 12.6, 

12.9, 13.3, 13.3, 13.4, 13.4, 13.6, 13.8, 14.0, 14.0, 14.1, 14.2, lie, 14.7, 

14.9, 15.0, 15.4, 15.6, 15.9, 16.0, 16.2, 16.3, 17.2, 17.4, 17.7, 18. l“. From 
Table VIH it willbe found that the value of is .24; consequehtly this 
is the value that must be added to and subtracted from the sample dis- 
tribution function SgoC^r) to yield the desired confidence band. Sjnce the 
step function S^oix) increases by the amount 3^0 at each distinct sample 
point, it is easily constructed. The graph of 53 o(ir), together with the 
graphs of the step functions that determine the desired confidence band, 
are shown in Fig. 3. Vertical lines have been added to the confidence band 
step function graphs for better delineation. 

The statistic i),, can also be used to test the hypothesis that a fandom, 
sample came from a population with a specified distribution function. 
This is accomplished by calculating the maximum difference between the 
hypothetical distribution function, say Fq(x), and the sample distribution 
function kS/;?::), and then determining whether this difference exceeds the 
critical value given in Table VIII. This use of the statistic yields another 
method for solving the “goodness of fit” problem that was treated in 
Chapter 10 by means of the test. The statistic possesses the advan- 
tage that it is an exact method, whereas the x^ method is valid qnly for 



348 


INTRODUCTION TO MATHEMATICAL STATISTICS 



fairly large samples. There is no such restriction here as requiring all 
cell frequencies to exceed 5 as in the case of the test because there is no 
necessity to classify the observations in carrying out the present test. 
The data may be classified, but then the test is no longer an exact one 
because the maximum difference for classified and unclassified data may 
not be the same; however, the discrepancy is usually slight if the classifi- 
cation is not too coarse. 

The problem of testing “goodness of fit” is ordinarily a parametric 
type problem, and therefore strictly speaking it does not belong in this 
chapter; however, it is included here because it arises naturally in a dis- 
cussion of confidence band methods. Furthermore, the test based on Z)„ 
possesses such striking advantages over the test in certain respects 
that it is important to make this test available to students. 

As an illustration of how is used to test a hypothetical distribution, 
consider the problem in 10.5 in which the distribution was used to test 
whether the observed and normal curve frequencies of Table 2, Chapter 
5, are compatible. Since the normal distribution parameters are estimated 
from the data, the test based on Z)„ will not be an exact test here. In 
carrying out the test it is first necessary to accumulate the observed and 
expected class frequencies in Table 2 for consecutive intervals in order to 
obtain distribution function values. Such calculations yielded the follow- 
ing values, in which the theoretical frequencies have been rounded off to 


Observed 

6 

34 

122 

302 

549 

809 

942 

984 

995 

1000 

Theoretical 

6 

34 

123 

309 

564 

794 

932 

984 

997 

999 


the nearest integer. If these values are divided by 1000, they will give the 
desired approximate distribution function values. An inspection of this 
table of values will show that the maximum difference is .015, which occurs 
in both the fifth and sixth sets of cells. Thus = .015 here. From 








NONPARAMETRIC METHODS 


349 


Table YIII it will be observed that the .05 critical value of is given by 


1.36 


1.36 

7T6OO 


.043 


Since = .015 is considerably smaller than this critical value, the normal 
curve fit would be judged to be satisfactory. This may be a questiohable 
conclusion because the value oi is likely to be somewhat smaller than 
normal when distribution parameters are estimated from data. Since the 
test based on is valid only for unclassified data and henpe is juappro- 
priate here, this illustration and similar exercises should be treated as 
merely exercises in applying the formulas. 

The preceding methods can also be adapted to testing the hypothesis ^ 
^0 • /i(^) = / 2 (^)j however, since they do not seem to possess any advan- 
tage over other nonparametric methods available for treating this problem, 
they are not considered here, 

The nonparametric methods that have been presented in this ; chapter 
were constructed on an intuitive basis. The critical region for each test 
was chosen by analogy with the critical region selected in a similar para- 
metric problem. A theory of “best tests'’ has not yet been developed for 
nonparametric methods ; therefore it is necessary to rely heavily bn intui- 
tion and attempt to show that the nonparametric test selected is superior 
to other available tests of this type for the problem being considered. 


REFERENCES 

Additional nonparametric tests may be found in the following books : 

Dixon and Massey, An Introduction to Statistical Analysis, McGraw-Hill book Co. 

Siegel, S., Nonparametric Statistics, McGraw-Hill Book Co. 

The theory related to the rank sum test is given in H. B. Mann and D. R.‘ Whitney, 
“On a Test of Whether One of Two Random Variables is Stochastically Larger than the 
Annals of Mathematical Statistics, 1%, 50-60. '■ 

The tables from which Table 2 was extracted are available in F. Swed and C. feisenhart, 
“Tables for Testing Randomness of Grouping in a Sequence of Alternatives,’’ 0 / 

Mathematical Statistics, 14, 66~S7, j 

The derivation of the formulas for serial correlation is based on advanced mathematics. 
It will be found in A. Wald and J. Wolfowitz, “An Exact Test for Randomness in the 
Non-Parametric Case Based on Serial Correlation,” Annals of Mathematical Statistics, 
14, 378-388. 

EXERCISES 

1. Find the probability that (a) the larger of 2 observations taken from a 
continuous distribution will exceed the true median and (/>) the smaller of 2 
observations will exceed the median. 



350 


INTRODUCTION TO MATHEMATICAL STATISTICS 


2. Using the frequency function for the smallest and largest observations in a 
sample of size derive the frequency function for the smallest observation. 

3. In an elementary school 17 pairs of first grade children were formed on the 
basis of similarity of intelligence and background. One child of each pair was 
taught to read by method I and the other child by method II. After a period of 
training, the children were given a reading test with the following results. 


Method I 

65 

68 

70 

63 

64 

62 

73 

75 

72 

78 

64 

73 

79 

80 

67 

74 

82 

Method II 

63 

68 

68 

60 

65 

60 

72 

75 

73 

70 

66 

70 

77 

78 

63 

74 

78 


Using the sign test and ignoring ties, test to see whether the methods are equally 
effective. 

4. Use the sign test to work problem 16, Chapter 11. 

5. Work problem 3 by means of the rank sum test. Why is the rank sum test 
not strictly applicable to a problem such as this? 

6. Compare the results in problems 3 and 5 with those obtained by applying 
the t test to the differences of paired values. 

7. Take random samples of size 10 each from the two horizontal distributions 

given hyf^{x) = 1,0 <x <1 and f^{x) = 2, 0 < ^ by choosing the proper 

sets of numbers from the table of random numbers. Test the hypothesis Hq '.fi(x) 
= f^{x) by (a) pairing values and applying the sign test {h) applying the rank 
sum test. 

8. The following 2 sets of observed values were obtained from sampling 2 
populations. Using the rank sum test, test the hypothesis that the 2 popu- 
lations possess the same frequency function. 


I 

25 

30 

28 

34 

24 

25 

13 

32 

24 

30 

31 

35 

II 

44 

34 

22 

8 

47 

31 

40 

30 

32 

35 

18 

21 35 29 22 


9. Test a set of 200 random digits for randomness by means of runs. 

10. A row of snapdragon plants was inspected for rust. The sequence of 
healthy and infected plants was as follows: HHHHIHIIIHIIIHHHHH 
H H 1 1 1 1. Use total runs to test for randomness of the infection, 

1 1 . Toss a coin 50 times, recording the sequence of heads and tails, and then 
test for randomness by means of total runs. 

12. Write down a sequence of a’s and b's totaling 50 letters that you feel is 
random. Test the randomness by total runs. 

13. Given the sequence of numbers 1, 1, 3, 3, 1, 1, 3, 3, 1, 1, 3, 3, 1, 1, 3, 3, 
1, 1, 3, 3, would you expect total runs to show up the lack of randomness 
here? 

14. The following data give the number of defective bricks for samples of 100 
each from day to day. Test for homogeneity of quality from day to day by using 






NONPARAMETRIC METHODS . ji - 

-ii' ■ 

total runs above and below the median 12.5. Read down consecutive columns 
to obtain consecutive counts. 


11 

12 

8 

13 

16 

11 

8 

12 

12 

10 

14 

16 

21 

9 

8 

18 

10 

8 

14 

13 

9 

.12 

14 

21 

9 

7 

13 

16 

13 

12 

16 

18 

19 

13 

13 

15 

11 

8 

9 

17 


1 5. Alternate the two sets of values obtained in problem 7 to obtain a' sequence 
of 20 values. Test for randomness by means of {a) runs (b) serial correlation. 

16. Obtain the annual rainfall records for your community for tlie last 50 
years and apply the 2 tests for randomness to the data. 

17. Name 2 lags that would be particularly effective in using serial correlation 
to detect the lack of randomness in the sequence given in problem 13 ’ 

18. Test the following set of measurements for a trend by means; of serial 

correlation, applying the formulas to the measurements after they Have been 
reduced by the subtraction of 28. The measurements are 28, 32, 37, 25, 31, 29, 
33,28,27,28,23,22,18, 17. ' 

19. Prove that the serial correlation test is unaffected by subtractingThe same 
constant from each observed value. 

20. Prove that the serial correlation test is unaffected by multiplying each 
observed value by the same constant. 

21. Work problem 7, Chapter 10, by means of the Kolmogorov-Smilinov test. 

22. Test the goodness of fit in problem 44, Chapter 5, by means of the Kol- 
mogorov-Smirnov test. 

23. Find an 80 per cent confidence band for the distribution functibn corre- 
sponding to the y values of Table 1 in 13.1. 

24. Find a 95 per cent confidence band for the distribution function corre- 
sponding to the data of problem 7, Chapter 10. 



CHAPTER 14 


Other Methods 


The hypothesis testing methods that have been presented thus far 
possess two restrictive characteristics. They are based on the assumption 
that a sample of fixed size is to be taken and that a choice is to be made in 
favor of one of two possible decisions. 

If samples can be taken one at a time and the information from them 
accumulated, one would expect to be in a better position to make decisions 
than if no attempt were made to look at the data until a sample of fixed 
size had been taken. There are methods available, known as sequential 
methods, that operate on this accumulation-of-information basis and that 
require considerably less sampling on the average than the fixed-size 
sample methods. 

The restriction that only two choices for decision making are possible 
can be bothersome in a problem in which there are several natural choices 
available. Thus, in the study of blood types there are four natural cate- 
gories, and it would be unrealistic to reduce them to two because one’s 
statistical techniques were designed for only two possibilities. There are 
methods, known as multiple decision methods, for treating such more 
general problems. 

The material in this chapter is devoted principally to explaining some 
of the basic but elementary ideas in these two new decision-making 
methods. 


14.1 Sequential Analysis 

Sequential methods possess striking advantages for testing hypotheses; 
therefore they are discussed here from that point of view. 

In testing a hypothesis, the sequential method gives a rule of procedure 
for making one of the following three decisions at each stage of the 
experiment: (1) accept the hypothesis, (2) reject the hypothesis, or (3) 
continue the experiment by taking an additional observation. 

357 



OTHER METHODS ' 3S3 

For the purpose of describing a sequential test, consider a single con- 
tinuous variable x whose frequency function f(x;d) depends on the 
single parameter 6, Although the sequential test about to be described 
may be applied to either discrete or continuous variables, the description 
is given for a continuous variable. 

Let the hypothesis to be tested be 

and let the alternative hypothesis be 

H^\e = 6x r 

Since a simple hypothesis is being tested against a simple alternative, 
the Neyman-Pearson lemma given in 9.1.3 would suggest using the likeli- 
hood ratio 

TT/(^;6x) 

i = l .. I-. 

as a basis for deciding between Hq and For a fixed-size sample of 
size 77 , the Neyman-Pearson method chooses as critical region those 
sample points for which this likelihood ratio is larger than a;; certain 
constant k. The region in which this ratio is smaller than k wopld then 
constitute the region for accepting A sequential test can be construc- 
ted by extending this fixed-size sample method slightly to include a region 
for continuing sampling. 

In discussing sequential methods, it is convenient to use the letter m 
in place of n to denote the size of a sample in order to distinguish it from 
the fixed n situation. The letter n is reserved for the size saniple that is 
required to reach a final decision. As a consequence, w is a :j:aildqm 
variable in sequential methods. Another convenient symbol is to 
denote the likelihood function when Hi is true and a sample of size m is 
taken. Now consider the likelihood ratio 

m 

j. IT/(*<> ^ 1 ) 

(1) — = , (m = l,2,---) 

n/(^; ^o) 

^=1 

By analogy with the fixed-size sample test, one would choose: as the 
region for accepting Hq those sample points for which ( 1 ) is snitall and 
as the region for accepting those sample points for which ( 1 ) is large. 
The new idea in sequential testing is to use part of the sample spafce for a 



354 


INTRODUCTION TO MATHEMATICAL STATISTICS 


third region such that if the sample point falls in this region the decision 
to accept Hq or will be postponed. From the preceding remarks, this 
postponement region should consist of those points for which (1) is neither 
small nor large. Thus in the sequential lest being described two numbers 
Cl and Cg are chosen and successive observations are taken, m = 1,2, 
• • • as long as 

< C, 

Pom 

However, whenever piJporn ^ sampling ceases and the decision is 
made to accept Hq, and, whenever PiJpom > Cg, sampling ceases and the 
decision is made to accept 

Now it can be shown that if Ci and Cg are chosen properly this sequential 
test will have prescribed values, a and for the two types of error. The 
exact values of Cj and Cg are not available; however, excellent approxi- 
mations are given by choosing 

(2) Cl = — • — and Cg = - 

1 — a a 

A justification for these approximations is given in 14.1,1. With these 
choices for Ci and Cg, the test is now complete. The name given to this 
test and the technique for carrying it out may be expressed as follows. 

(3) Sequential Probability Ratio Test: To test the hypothesis Hq\6 — 
Bq against the alternative //i : 6 = 6^, calculate the likelihood ratio Pimlp^m 
and proceed as follows: 

(0 ^ , accept Hq 

Pom 1 - a 

(ii) if > , accept 

Pom a 

(Hi) if — - — < ^ , take an additional observation 

1 - oc Pq^ a 

One of the striking features of this test is that it is not necessary to 
derive the frequency function of a statistic such as r or jP in order to 
carry out the test. Furthermore, one can decide in advance what size 
type I and type II errors to tolerate rather than fix the type I error and 
then be forced to calculate the type II error as in fixed-size sample tests. 
On the other hand, one never knows how large a sample will be required 
to arrive at a decision because n, the size sample needed, is now a random 
variable. A general formula exists for calculating the mean value of n, 
so that one can determine in advance how large n is likely to be. 



OTHER METHODS 


355 


As an illustration of how a sequential test is constructed, consider the 
problem of determining whether the mean of a normal variable with 
variance 1 has the mean Oq or the mean Here 


/(x; 6) = 


hence (1) becomes 


m \2 1 w 

TT^ ^ 

Pom ™ 




Now (iii) of (3) is equivalent to 

log ^ < log — < log ^ J 

1 - « Pom « 

For this problem, these inequalities become 

log ^ +^(6,^- 00 ^ < (0, - 6,)fx, 

I — oc 2 i=i 


<iog^— 

^ ^ ti- ... 

If 01 > 00? this is equivalent to 

W 7-^ioer^ + ^(e. + e.)<l*. 

t>i — Oq i — a 2 1=1 

^ 1 1 1 ~ /-u ^ 

<- — + — (^o + ®i) 

C/^ — - vq oc 2 

For 01 < 00, these inequalities would be reversed. 

As a numerical illustration, suppose that a = .05, jS = .10, 0^ = 9.5, 
and 01 = 10. Then (4) becomes 

-4.50 + 9.75m < 2 < 5.78 + 9,75m :: 

« = 1 

Following (3), the test now proceeds as follows: I 

m ' ■ H 

(i) if 2 ^ —4.50 + 9.75m, accept 6 = 9,5 ' 

i = l ^ ' ii . . ' 

(ii) if 2 ^ + 9.75m, accept 0 = 10 

« = 1 i: ■■ 

(iii) if neither inequality is satisfied, take another observation; 



356 


INTRODUCTION TO MATHEMATICAL STATISTICS 
Table 1 


5.78 + 9.75m 

15.53 

25.28 

35.03 

44.78 

54.53 

64.28 

74.03 

83.78 

93^53 

103.28' 

113.03 

122.78 

EXi 

10.47 

20.98 

30.76 

42.93 

52.88 

64.10 

73.41 

83.16 

91.72 

101.24 

112.89 

123.24 

-4.50 + 9.75m 

5.25 

15.00 

24.75 

34.50 

44.25 

52.00 

61.75 

71.50 

81.25 

91.00 

100.75 

110.50 


An experiment was performed by taking successive samples from a 
normal population with mean 6 = 10 and variance 1 until a decision was 
reached. The decision to accept which is the correct decision here, 
occurred at the twelfth observation. The values of obtained in the 
experiment, together with the values of the decision boundaries, are dis- 
played in Table 1. 

As a second illustration, consider the problem of determining whether 
p =:z ox p for a binomial distribution. If one chooses a; = 1 for 
success and a; = 0 for failure, /(a;; 6) will be given by /(l;/^) and 
f{0;p) = q. Now, suppose that the first m trials of the event produced 

m 

successes. Then the likelihood function JJ/ (ar^; 0) will consist of the 

product of p's and q's, a p occurring as a factor whenever a success occurred 
and a q otherwise. The likelihood ratio (1) then becomes 

Plm _ Pi Hi 

n n ~ 

POm Po ^0 

If this expression is substituted in (3) and the desired numerical values 
are assigned to p^, p^, a, and the test procedure will be determined. 

As a numerical illustration, let = .5, p^ = .7, a = .10, and /S = .20. 
These values may be thought of as those that might be used to test the 
honesty of a coin when that coin is suspected of giving too many heads. 
Here /?/(! — a) = f, (1 — /3)/a = 8, and 

Pim _ (.7)^-(.3r-^^ ^ /3\-/7y 
Pom \3/ 

The first inequality in (3), 



can be written more conveniently in the form 


/ 




m 


log- 


log 3 log I 


In a similar manner the second inequality becomes 

, . log 8 , _ log f 









OTHER METHODS j57 

Table 2 


m 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 


0 

0 

1 

1 

1 

0 

1 

1 

0 

1 

0 

0 

1 

0 

iio 

dm 


0 

1 

2 

3 

3 

4 

5 

5 

6 

6 

6 

7 

7 

1[7 


If these logarithms are evaluated, the test will proceed as followsr 

(i) if ^ — 1-78 + .603m, accept /? = .5 ;; 

(ii) if ^ 2.45 -f . 603m, accept/? = .7 :! 

(iii) if neither inequality is satisfied, take another trial 

Tosses of a coin gave the results shown in Table 2. For the purpose of 
determining when one of the inequalities is satisfied, it is conyenient 
to represent these inequalities and the results of the successive trials 
graphically. If m and are treated as the coordinates of a pomt, the 
straight lines === --1.78 + .603m and 2A5 + M wi|l serve 
to divide the m, plane into three regions corresponding to the three 
possible decisions at each trial. The graph corresponding to this problem 
is given in Fig. 1 . From this graph it will be observed that the experiment 
terminated after 15 trials because inequality (i) was then satisfied. In 



m 

Fig, 1. Sequential test for testing p = .5 against p = .7. 








358 


INTRODUCTION TO MATHEMATICAL STATISTICS 


accepting the hypothesis that/? = .5, the experimenter does so in pref- 
erence to accepting the hypothesis that p = .7. 

As stated earlier, sequential methods often reduce considerably the 
size sample needed to arrive at a reliable decision in testing hypotheses. 
For, example, in the preceding illustration it is not difficult to show that 
a fixed-size sample of approximately 26 will suffice to test H^.p — 5 
against H^.p = .1 with a = .10 and yield a type II error of /? = .20. 
However, in the theory of sequential analysis it can be shown that the 
average size sample needed to arrive at a decision in this illustration with 
a = .10 and /? = .20 is approximately 13. The experimental result of 15 
is in good agreement with expectation. For problems of this type it can 
be shown that one saves 40 to 50 per cent in sampling by using sequential 
methods. This follows from a formula in 14.1.1 which gives a good 
approximation to the value of E[n], 

The reason for the advantage of the sequential approach over the fixed- 
size-sample approach lies in the ability of the sequential method to reach 
an early decision for samples that are extremely favorable to either Hq 
or Hi. Thus, if a good coin were tossed a number of times and gave rise 
to a fairly high percentage of tails, it would be clear rather early that 
p = .5 should be accepted in preference to /? = .7. Conversely, if the 
coin were biased toward heads and if a high percentage of heads occurred 
in the early stages one would accept /? = .7 without continuing the experi- 
ment further. 

The savings that can be realized by the sequential approach may be 
even greater than theory would indicate because in real-life experiments 
the actual value of /?, for example, might heavily favor either Hq or Hi. 
Thus in the preceding illustration, if it were true that p = .4, the sequential 
test would quickly accept H^ in preference to Hi. This ability to arrive at 
an early decision can be very useful in such fields as sampling inspection, 
where it is not uncommon for lots to be very bad when they are bad or 
very good when they are good. 

The sequential probability ratio test given by (3) possesses the disad- 
vantage that, strictly speaking, it applies only to testing a simple hypoth- 
esis d = against a simple alternative B = Bi. This disadvantage can 
often be circumvented by properly rephrasing the problem to be solved. 
For example, suppose that a consumer wishes to determine whether a 
producer’s fraction defective is actually p^ as claimed by the producer. 
He may fear that the true fraction defective is larger than /?o; consequently 
he would be interested in testing p = po against p > p^. But he may be 
willing to state a value of /? > /?o, say p = pi, such that it would begin to 
become a serious matter if p exceeded pi\ whereas, if p were less than pi, 
even though p > no serious harm would result. By this device of 



OTHER METHODS 


359 


deciding on an upper limit /?! for /?, the problem can be reduced to the 
ordinary sequential test of testing Hq\p = against Hi:p ^ Pi^ Similar 
devices can often be used to arrive at satisfactory tests for composite 
hypotheses as well. 


14.1.1 Approximations for Ci, Cg, and E[n\ 

For each value of m == 1, 2, 3, * * * , let the sample space be divitied into 
the three regions and R^^ corresponding to the three Impossible 

decisions of accepting Hq, accepting or continuing to sample, respec- 
tively. These regions are in the m-dimensional space determine! by the 
variables X 2 , • • • , x^. When m = n, the sample point must have fallen 
in R^^ for all values ofm<n because n denotes the sample size for which 
the decision to accept Hq or is first made. In terms of this notation, 
consider the problem of calculating the value of 1 — /S. In this Calcula- 
tion the sample point is denoted by x, regardless of the dimension of the 
sample space. Thus ;; 

1 - ^ = P{accept H, [ H,] = 2 P{x e \ H,} 

n = l 



The symbol P{x e R^ 1 H^} denotes the probability that the sample point 
X will fall in the region R^^, given that holds. But the sample point x 
can be in R^ only if it satisfies the inequality ^ consequently 
this inequality holds for all points in As a result, : 



^ ^ 2 ( • • • ICiPond^l’ • •dx„ 



= C2 2 e I Ho} 

n = l 

= CgPlaccept Hi | Hq} = Cga 
This shows that the number Cg satisfies the inequality 


C9 < 


1-/3 


(5) 



360 INTRODUCTION TO MATHEMATICAL STATISTICS 

The same type of calculations when applied to finding the value of /? will 
lead to the inequality 

( 6 ) " 

1 — a 

It is illuminating to look at these inequalities from a geometrical point 
of view. For this purpose the two lines in the plane whose equations 
are 

(7) Cga = 1 — /3 and Ci(l — a) = 

have been graphed in Fig, 2 for a typical choice of q and c^. These are 
merely the relations given by the formulas in (2). The shaded area repre- 
sents all pairs of a,/? that satisfy the inequalities (5) and (6). 

Suppose oc and /? are temporarily chosen equal to .10 and that Cj and Cg 
are chosen to be the corresponding values given by (2). Then 

q = ^ and C 2 = 9 

The values of and l/cg shown in Fig. 2 are somewhat larger than these 
values ; consequently the shaded region for these choices would be smaller 
and more nearly a square region than that shown in Fig. 2. The point of 
intersection of the two lines would, of course, be given by a = ,10 and 
= .10. Since the shaded region is very nearly square, it follows that 
when Cl and are chosen in this manner, the actual value of a (and 
similarly of /?) can possibly exceed .10 by only a small amount. Further- 
more, if one of them should exceed .10, then the shaded area shows that 
the other would need to be less than .10. The greater the excess over .10 
for one of them, the smaller in value the other must be. 




OTHER METOOm ■! 361 

In view of the preceding discussion, if and Cg are chosen to be the 
values given by (2), the true values of a and /? will probably differ very 
little from those that were selected to be used in (2), particularly when 
and l/cg are very small. As a consequence, if a sequential test is coristructed 
with values of q and Cg given by (2), it will for all practical purpose® possess 
sizes of the two types of error that do not exceed the selected values of oc 
and /8. Thus one can decide in advance what protection against error is 
desirable and then by choosing and Cg as in (2) be assured that at least 
this much protection will be realized in carrying out the test. 

The derivation of the formula alluded to earlier for approximating the 
value of £[ 12 ] is quite lengthy; consequently, only the result of the deriva- 
tion is given here. In this connection, let P(6) denote the probability of 
rejecting Hq if 6 is the true parameter value. Thus P(0o) is equal to a and 
1 — P(6i) is equal to when testing ^ 0o against H^\d = 9^. 
Further, let 2 = log/ (a;; — log/ (a?; Then the desired ^formula 
may be expressed as follows : i| 

(8) Eln] = log ^2 + [1 - P(^)l log Cl J 

■EW ;i 


In applying this formula, one uses the approximations for q and Cg given 
in(2). 

For the purpose of illustrating how to use this formula, consider the 
binomial distribution problem that was discussed in the preceding section. 
For that problem, 

[g, if a; = 0 

Since the value of £[«] depends on what value is assigned to 6, it is neces- 
sary to specify the value of p. The hypothesis Ho was actually triie here; 
consequently in making comparisons the value of Eo[ri\ will be used,' When 
Ho is true 


EoM — Eg 



/(^; gi) ' 
/(^; 0o)- 


= ^oJog^ + Polog- 


In view of the numerical values — .5, = .7, a = .10, and ^ = 

and the fact that P(0o) = ‘^c, it follows from (8) and calculations t|at 


Eo[n] = 


,10 log 8 -f .90 log f 
.5 log 1 -h .5 log 5 


.20 



362 INTRODUCTION TO MATHEMATICAL STATISTICS 

This is the value that was claimed for £*[«] in the earlier illustration. 
Similar calculations will show that if is true then E^[n\ =^17. 


14.2 Multiple Classification Techniques 

In all the problems of testing hypotheses that have been considered 
thus far it was necessary either to accept or reject some hypothesis. This 
is true for sequential methods also, even though a final decision may be 
postponed for some time. There are many problems, however, that 
cannot be treated in this simple manner because they involve more than 
just two possible decisions. For example, a botanist may wish to classify 
a group of plants belonging to three different varieties into their proper 
variety. This is a three-decision problem and it is unnatural to attempt 
to solve it, say, by successive two-decision-problem methods. In this 
section a multiple-decision technique for such classification problems is 
presented as an introduction to general multiple decision methods. 

For simplicity of exposition the following discussion is limited to the 
case in which there are three possible categories of classification; however, 
the method presented can obviously be extended to any number of 
categories. 

Let * * ’ ? % denote k random variables corresponding to k dif- 

ferent measurements that are to be taken on an individual of some popula- 
tion. Thus, for a population of flowers, the a;’s might represent such 
characteristics as petal length, petal width, and stamen length. Let the 
population to be sampled consist of the three subpopulations ^ 2 ? 

TTg and let these subpopulations constitute the proportions and p^ 

of this population. 

Since an individual will be determine J>y his values of ^ ^ 

the problem of classification is the problem of dividing the fc-dimensional 
sample space into three parts, say, Sj, and Sg, corresponding to the 
three subpopulations, and agreeing to classify an individual as belonging 
to subpopulation if his sample point lies in In this connection, it 
seems reasonable to choose as a criterion of optimality a division of the 
sample space that will maximize the probability of classifying an individual 
correctly. 

From the theory in 9.1.3 on how best tests are constructed and from the 
theory of sequential analysis, one would guess that the ratios of likelihood 
functions will undoubtedly play a leading role in the determination of a 
best set of S’s. For example, a sample point at which the probability 
density under is considerably larger than under either or should 
certainly be placed in the region Sy. The feature that makes this problem 



OTHER methods 


363 


somewhat different from earlier problems, in addition to that of having 
more possibilities for decisions, is the introduction of the proportions 
add /?3 for the relative frequency of occurrence of samples from the 
various subpopulations. If for example, were close to zero, then for 
all practical purposes the problem would reduce to a two-decision problem 
and earlier methods could be used to solve it. Thus it is clear that the 
/ 7 ’s must also enter in the determination of a best set of S's, m the 
theory of best tests, a theorem will be stated, and then proved, that yields 
the desired optimum solution. In this theorem the letter a; is ; used to 
denote the vector variable ^ = 0 ^^, iTg, • • • , and f^{x) will depote the 
frequency function of the variables x^, x^, • • • , x^ in the subpopulation 
1,2,3). 

Theorem: The regions = 3) into which a k-dimensional 

sample space should be divided to maximize the probability of correctly 
classifying an individual selected at random from the population cqmposed 
of the subpopulations ir^ii == 1, 2, 3), which constitute 4he proportions 
p^ {i == 1 , 2, 3) of that population and which possess the frequency functions 
ffx) (i = 1, 2, 3), are determined by the points x that satisfy the inequalities 

Proof: Let be any other division of the sample spacp. Now 

any one of these regions Si {i = 1, 2, 3) can be divided into three subregions 
Sii, and Si^ such that 5,^. (y = 1, 2, 3) contains only points found in 
Rj, Thus Si^ contains all the points common to the two regions- Si and 
This subdivision of 5^ can be expressed by means of the formula 

(9) Si^Sa^Si^ + Si^, (/=1,2,3) J 

In terms of this notation, it is also true that ^ 

(10) Ri = Si,. + S 2 , + S 3 ,, (y - 1, 2, 3) ; 

This formula follows from the fact that eiyery point in must belong to 
one of the three regions Si, S 2 , S 3 . 

Now the probability of correctly classifying an individual selected at 
random from the population when using the division of the sampie space 
determined by the S’s is given by 

-Ps = 2 e I 

«=i « 

= 2 Pi{ 

i=l 

The preceding integrals are fc-dimensional multiple integrals with the 



364 


INTRODUCTION TO MATHEMATICAL STATISTICS 


arguments however, they are written symbolically as 

single integrals with respect to x. This is the same convenient notational 
device that was used in 9.1.3. Now, by means of formula (9), can be 
written in the form 

(11) = 2 f Pi /i(*) dx+( p^ fix) dx + ( Pi fix) dx 

i = l LJsa '’Sii •fea 

Since all the points lying in ^ii are in Ri, it follows from the definition of 
R^, given in the theorem, that they must satisfy the inequality 

( 12 ) Pxm>.Pim, (;= 2 , 3 ) 

Similarly, all points in must satisfy 

(13) Pa flpo) ^ Pj fix), (7=1, 3) 

and all points in must satisfy 

(14) Pa flf) > Pi fix), (J = 1, 2) 

In view of (12), if fi(x) is replaced by p^ /^(x) in the first of the three 
integrals in (1 1), that integral will become at least as large as it was before, 
for all three values of i. For i = 1 there is, of course, no change; however, 
for i = 2 and i = 3 the value of the integral will be increased unless 
inequality (12) is an equality for the points of Similarly, because of 
(13), if Pi fi{x) is replaced by p^ f^{x) in the second of the three integrals in 
(11), that integral will become at least as large as before. Finally, the 
same conclusion will hold for the third integral if pi /^(a?) is replaced by 
p 3 / 3 (^)- Thus, it follows that 

(15) Ps ^ 2 r f Pi /i(®) dx+( Pa / 2 (*) dx +( pg fix) dx 

^=1 LJsn Jsii •'Sis 

But from (10) it follows that + S '21 + S^i = Ri and therefore that 
the sum with respect to i of the first integral in (15) must yield the integral 
of Pi fi{x) over the region Pj. Similar reasoning may be applied to the 
other two sums of integrals to yield the result 

(16) Ps ^ Pi /i(») dx+\ Pa fix) dx+\ pg fix) dx 

^Ri *^222 

If the p’s are factored out and the right side is expressed in probability 
language, (16) will yield the desired result, namely 

^ ^ PiP{^ ^ I'^i} ^ Pr 



OTHER METHODS 


365 


Since the set of regions 5^, Sg, was any set other than ^3 

defined in the theorem, this inequality proves that R 2 , R^ is m opti- 
mum set in the sense of maximizing the probability of a correct classifica- 
tion. 

Points that satisfy the equality part of the inequalities defining the 
regions Ri, R 2 , R^ niay be placed in any one of the regions for ^^|lich the 
equality holds. In applications the boundaries of the regions are usually 
surfaces (k ^ 3) so that there is seldom any problem on this score. 

The preceding method of proof can obviously be applied to the case in 
which there are more than three categories of classification. 

The difficulty in applying the preceding theorem arises from the fact 
that one seldom knows the values of the / 7 ’s and even the values of the 
parameters determining the density functions. Then it is necessary to 
estimate such parameters by means of a random sample fromi tbe total 
population and in the process be able to classify each individual: into its 
proper subpopulation. The resulting decision regions will be estimates of 
the optimum decision regions given by the theorem. ■ 

As an illustration of how the theorem is applied when no estMation is 
required, consider the following information. A population consists of 
three subpopulations in the proportions J, J, and Each subpojpulation 
is a two-variable normal population with independently distributed 
variables possessing unit variances and means (—1, 0), (0, 1), arid (1, 0), 
respectively. 

In the notation of the theorem pi = i, = h Ps = h 


/x(*) = 


-§[(* 1 + 1 )” 


277’ 


m - - 


277 



277 


■ . ji . 

The region iJi is therefore the region determined by the two inequalities 

\e ^ v. ^ ^ 

4 277 4 277 


and 


\e ^ ^ ^ 

4 277 ?77 



366 INTRODUCTION TO MATHEMATICAL STATISTICS 

These inequalities are easily shown to reduce to the inequalities 


and 


^1 + ^2 


^1 < -Jlog,2 

Similar calculations will show that is determined by the inequalities 

+ ^2 > ^ 


and 


^2 “ > log, 2 

The region is clearly the remaining part of the plane not occupied 
by and R 2 ; however, one can calculate it directly from definition and 
show that it is determined by the inequalities 


> ~ilog,2 

and 

X 2 -Xi< log, 2 

In order to determine the regions R^, R^, and R^, it suffices to graph the 
lines whose equations are 

"h ^2 ~ ^ 

4* ^1 = “iloge2 

4 • ^2 ~ loge 2 

The graphs of these lines are shown in Fig. 3. From the inequalities 
defining R^, R 2 , and R^ it is clear that Ri is the region below line 4 and to 
the left of line 4 , that R 2 is the region above 4 and above 4, and that R^ 
is the remaining part of the Xj^,X 2 plane. These regions are shown more 
clearly in Fig. 4. 

If the p's had all been equal, log 2 would have been replaced by log 
1 = 0 in the preceding inequalities and then the regions would have been 



Fig. 3. Boundaries for classification regions. 



OTHER METHODS 


367 



Fig. 4. Optimum classification regions. 


those obtained by shifting the three-ray boundary configuration of Fig. 4 
parallel to itself until the vertex was at the origin. This illustrates the effect 
the /?'s have on the classification regions. 

14,3 Bayes Techniques ;■ 

A somewhat different approach to decision making can be formulated 
in terms of economic losses or gains rather than in terms of tjie prob- 
ability of making the correct decision. It may well be, for example, that 
making one of two possible incorrect decisions is not nearly so serious 
economically as making the other incorrect decision. In such shuations 
it would be desirable to weight the relative importance of the various errors 
that can be made. Decision-making procedures based on such notions 
can he constructed that are capable of treating both estimation and hy- 
pothesis testing problems. One such procedure is discussed here jfrom the 
estimation point of view only. ;; 

In constructing decision-making procedures of the foregoing type, it is 
necessary to introduce the concept of a frequency function of The pa- 
rameter that is being estimated or tested. This concept essentially distin- 
guishes Bayesian methods from the more traditional ones. 

Let X be a continuous random variable with frequency function 0) 
and let t — /(%, oTg, • • • , x^) be any estimate of 6 based on a randorn sample 
of size «. Furthermore, let W(t, 6) be a weight function that measures the 
economic loss in claiming (estimating) that the true value of the parameter 
is t when it is actually 6. For example, one might choose W as the fuhcfi^^ 
W{t, 6) = c{t — 9)2 if large errors of estimation are very serious dr as the 
function W{t, 9) = — 9| if they are not quite so serious. " 

As a criterion for determining whether t = t{x^, • • • , x^ is;;a good 



368 INTRODUCTION TO MATHEMATICAL STATISTICS 

estimate, one can use the expected value of the weight function. Thus 
one considers the quantity 

(17) rit, 6) = E[W(t, 6)] = J- • ■ J W^t, 6) S) dx^ dx. 

The weight function W{t, d) is usually called the loss function and the 
quantity r{t, d) is called the risk function. An estimate that makes the risk 
function small in some sense would be considered a desirable estimate. 
The difficulty with (17) as a basis for judgment is that the result of the 
integration is usually a function of 0, and an estimate t seldom minimizes 
(17) for all possible values of 0. It is therefore necessary to introduce 
some further criteria before (17) can be used effectively to determine 
whether an estimate is a good one. 

One approach to a solution is to study the behavior of the risk function 
and use some property of it as a basis for judgment. Since, as indicated 
in the preceding paragraph, it is unlikely that the graph of r{t, 0) for one 
estimate will lie below the graphs of all other estimates, it is customary to 
look only at the maximum value on the graph and then hunt for an esti- 
mate t that has the smallest such maximum value. If there exists an 
estimate with this property, it is called the minimax estimate. Figure 5 
illustrates this criterion as it applies to only two possible estimates. For 
this situation, estimate t^ is the minimax estimate. 

A second approach to a solution is to introduce a frequency function 
for the parameter 0 and then calculate the expected value of r{t, 0) with 
respect to this frequency function. Since the result will be a number rather 
than a function of 0, the comparison of estimates becomes a simple 
matter. The principal difficulty with this approach is that one seldom 
has any precise knowledge of what frequency function for 0 is realistic 
or whether it is realistic to introduce such a function at all. There are 
problems, however., for which one can postulate a realistic distribution. 
For example, familiarity with manufacturing processes suggests that the 
assumption of a normal distribution for p, the probability of getting a 



Fig. 5. Graphs of expected losses for two estimates. 


OTHER METHODS 


369 

head in tossing a randomly selected penny, is a reasonable oiie. One 
would probably choose the mean to be i but would need to perform some 
experiments on pennies to obtain a variance estimate. If, however, one 
were interested in estimating the mean brain weight of a certain race of 
prehistoric people by means of skulls found in an archaeological' excava- 
tion, it is unnatural to assume that this niean possesses some probability 
distribution. 

For the purpose of developing this approach further, let denote 
the frequency function that has been selected for the parameter !&. Then 
the expected value of r(t, 6) with respect to this frequency function is 
given by 

r(t,X)=jr{t,e)Z(d)de 

J .. I,.. .. . 

This expected value is usually called the mean risk. It depend^ on the 
estimating function ^ , a? J selected and also on the frequency 

function X{&) chosen. For a given A(6), one tries to find an es|imating 
function /(oTi, • * • , x^ that minimizes the mean risk. If such a jfunction 
exists, it is called the Bayes solution to the problem corresponding to the 
frequency function X{0). Different choices for l{d) may well give rise to 
different minimizing estimating functions ; therefore one does npt speak 
of a Bayes solution without specifying the function X{B). 

Bayes methods is a name given to statistical methods that introduce 
distributions of parameters at some stage of their development, i^lthough 
a Bayes approach to estimation has been explained here as an aiternative 
to the minimax approach, it can be shown that the Bayes technique is 
often a useful one for obtaining a minimax solution; cons^uently, 
Bayes methods are useful even in situations in which it seems unrealistic 
to introduces distribution for a parameter. : 

When the loss function in estimation is chosen as squared error, there 
is often a simple technique for determining a Bayes solution. iFor the 
present assume that only a single observation x is to be made.| Under 
these assumptions the mean risk reduces to 

r(t, X) = jj(t - 6) X{B) dxdd I; 

When X and 6 are both treated as random variables, /(a;; 0) is the condi- 
tional frequency function of a? with 6 held fixed. Consequently, if 6) 
denotes the joint frequency function of x and 6, it follows that I 

g(x, 6) == 0) 




370 INTRODUCTION TO MATHEMATICAL STATISTICS 

But g(x, 6) can also be written in the form 
( 18 ) g(x,d) = h(x)g(e\x) 

where h(x) is the marginal frequency function of x and g(6 | x) is the 
conditional frequency function of 6 with x held fixed. The equivalence 
of these two ways of expressing g(x, 6) enables the mean risk to be written 
as 


F(t, X) 


Of h(x) g{d I x) dx dd 


or, if the order of integration is interchanged, as 

Fit, A) = j* hix) J(0 - tf g(e I x) dd dx 

Now the inner integral is merely the second moment about the point 
t of the variable d, for x fixed. Since the second moment of a variable is a 
minimum when it is taken about the mean of the variable, it follows that 
this integral is minimized for each value of ir if / is chosen as the mean of 
the conditional distribution of 6 for x fixed. The double integral is there- 
fore ako minimized by this choice; consequently 

t{x) = E(6 I x) 

yields the desired solution. 

The problem of finding a Bayes solution is now seen to reduce to the 
problem of finding the conditional expected value of the parameter 6 
when the variable x is held fixed. This solution is based on the assump- 
tion that only a single value of x is to be observed. It is considerably more 
general than this, however, because x may be chosen to be a statistic, 
such as a maximum likelihood estimate, from which an estimate t is to be 
constructed, or it may be a vector variable. Thus, in estimating the mean 
of a normal distribution, x might well represent x or the vector variable 

As an illustration of how a Bayes solution is obtained when the loss 
function is chosen to be the squared error, let 





'\/27T /S 


and 



OTHER METHODS 


371 


g(x, (i) = AO)/(a;; /x) = 




The marginal frequency function h{x) is obtained by integrating the joint 
frequency function g{x, with respect to hence ;■ 

e 2 LV a ; ^ ; j 

This integral can be evaluated by squaring out the two binomials^ collect- 
ing terms in and fji, completing the square in fx, and then recpgnizing 
the value of the resulting integraL The result of such manipuiations is 
that ii 

^2 a2+i32 

h{x) = — ^ . 

V2W0? + 

From (18) and (19), it then follows that 

a/ 0.2 M 

= J 

But the calculations that produced h{x) show at once that the expression 
Sif^ I niust reduce to Ij 

^Q. I r.) = (■- w) 

VIttccB 


This result demonstrates that the conditional distribution of for a? 
fixed, is normal with the mean I; 


E(/x I *) 


a 2 4 . ^2 


The Bayes solution for this estimation problem and for this particular 
choice of frequency function A(a) is therefore given by 

t(x^ = |l 

, . + " 

It is convenient to write this result in the form 

■'ii ■ 

1 + d 

where <5 = is the ratio of the two variances. ' 

Suppose experience has shpw^^^ quality characteristic, ; such as 

breaking strength of a manufactured product, is a normal variable and 



372 INTRODUCTION TO MATHEMATICAL STATISTICS 

that if [x denotes the mean of this normal variable for a shipment of this 
product then ^ may also be treated as a normal variable, corresponding 
to successive shipments. Let denote the grand mean, that is the mean 
of the various shipment means, and assume that experience has yielded a 
grand mean of — 200 and a standard deviation of ^ — 5 for such ship- 
ment means. Now suppose a fresh shipment comes in and the mean jx 
of this shipment is to be estimated by means of a sample of 25 items 
selected at random from the shipment. Assume that experience has 
yielded a standard deviation of 50 for the random variable x that repre- 
sents the quality characteristic of a single item in a shipment. If x is now 
chosen as x in the preceding theory, the value of a becomes a = 50/'\/ 25 = 
10. Calculations with these values yield the estimate 

t(x) = - + 1 200 
5 5 

This estimate gives four times as much weight to past experience as to 
the sample estimate x. Additional calculations will show that the mean 
risk for this estimate, that is the value of F{t, A), is + /3^) = 20. 

Similar calculations will show that the mean risk for the estimate x, 
which is based on the sample only, is cf? = 100. Thus there is a very large 
reduction in the mean risk when past experience is incorporated into the 
design of the estimate. 

The difficulty in applying these methods is that there is seldom expe- 
rience of the kind assumed here. Furthermore, even when there is ex- 
perience available, that experience often shows that successive shipments, 
for example, cannot be treated as random samples from a population of 
shipments. There is often strong correlation over time between the 
quality of successive shipments of a manufactured product. Thus, al- 
though the Bayes approach may yield considerably better results in the 
sense of mean risk, the results may not be trustworthy. 

The traditional method of comparing two estimates is to base the 
comparison on the mean squared error. This does not require the assump- 
tion of a probability distribution for the parameter in question; therefore 
the comparison can always be made and there is no danger of obtaining 
estimates of questionable accuracy and precision. 

REFERENCES 

The mathematical theory behind sequential analysis may be found in A. Wald, 
Sequential Analysis, John Wiley and Sons. 

The decision function approach to statistical problems can be found in H. Chernolf 
and L. E, Moses, Elementary Decision Theory, John Wiley and Sons. 



OTHER METHODS 'i 373 

EXERCISES ! 

1. Choosing a = .2 and /? = .2, test the hypothesis //<,:/>= 3 against 
H^\p ^ A sequentially by tossing a coin until a decision is reached. Here/? is the 
probability of getting a head. 

2. Choosing a =s .1 and = .1, construct a sequential test for testing JTqtcy = 

8 against ^ 10 for a normal variable with 0 mean. !i 

3. Construct a sequential test for testing == 3 against =:ii2 for the 

frequency function f{x\0) ^ ^0. Choose a = .1 and /? = .2. 

4. Using the value of n needed to reach a decision in problem 1, calculate by 
means of the normal approximation to the binomial what the value of P is for 
a = .2 for a nonsequential test of Hq based on this value of n, 

5. Graph the lines in the a,^ plane given by formulas (2) when and Cg are 

determined by a = = .05. From this graph observe to what exten| a and 0 

can exceed .05 in value. 

6. Construct a sequential test for testing the hypothesis ^ the 

alternative ^ for a Poisson distribution. n 

7. By using random sampling numbers draw repeated samples from a Poisson 

population with ^ — 2. Use the test derived in problem 6 on these sample values 
to test /i = 2 against ^ 3 with a ~ ,10 and p = .10. 

8. Use formula (8) and the data for the problem used to illustrate it to verify 
that Ei[n\ ==17 for that problem. 

9. Calculate £■![«] for the problem displayed in Table 1 and compar^; with the 
value of n actually realized in that problem. 

10. Calculate E^ln] for problem 3 and compare with the value of « ’obtained 
in carrying out the test, 

11. For the problem related to Table 1, calculate how large a jfixed-size 
sample you would need to use to have a and ^ equal to the values us^d in that 
sequential test. 

12. Let 3 single- variable normal subpopulations possess unit variances and 

means —1, 0, 1, respectively. Find what the best classification regions are if the 
population proportions are given by (a) pi — Pq = p^ = (b) pi = Ip^ == 4p^, 

13. Work problem 12 if the 3 subpopulations are 2-variabie populations with 
independent variables with unit variances and means ( — 1, 0), (0,0), (1,0), 
respectively. Comment on the results in these 2 problems. 

14. Let 3 single- variable subpopulations possess the distributions = 

0 = 1, 2, and 3, respectively, and let pi = />2 = Pai Find the best classi- 
fication regions. ;; 

15. Apply the best classification technique to two 2- variable norfiial sub- 
populations whose means are ( — 1,0) and (1,0), whose correlation colefficients 
are both equal to J, and all of whose variances are equal to 1 . Assume = p^^ 

16. Compare the method of best classification when there are 2|. normal 

subpopulations with the method of linear discriminant functions. Assume 
Pi — P 2 that the sample is so large that sample estimates used in the discrim- 
inant function may be treated as population values. Assume p = 0- : 



374 


INTRODUCTION TO MATHEMATICAL STATISTICS 


17. Using fV(t, 6) = It -01, /(x; 6) = 1/6, 0 < x ^ 0, and m = 

6 > 0, (a) calculate the risk function and the mean risk for the estimate t, = x. 
(b) Compare this mean risk with that for the estimate t = fji». 

18. Using W{t, 0 ) = (r - 6) = 6e~^\ x > 0, and A(0) = | 0 , 1 < 0 < 

2, calculate (a) the risk functions for the estimates tx= x and = x — I and 
determine which is the minimax estimate with respect to these 2 estimates only. 
{h) Calculate the mean risk for estimate (c) Find Bayes solutions with respect 

to estimates and ^2- _L _m2 

19. Given W{t, 0) — {t — 0)^, f(x; 6) = e I^27 t, and 2(6) = e 

V 2tt a, (a) calculate the risk function for the estimate t =x, (b) calculate the mean 
risk for this estimate. 

20. Given that a? is a binomial variable with parameters n and /? and that p 
possesses the beta distribution 

calculate the Bayes solution. 

21. Given that a? is a Poisson variable %ith parameter and that ju possesses 
the gamma distribution 

calculate the Bayes solution. 

22. Given that x possesses the gamma distribution 

0 a 

and that 9 possesses the gamma distribution 


m 




0a-l^— /50 


calculate the Bayes solution. 



APPENDIX 1 


1. Properties of r 

The purpose of this section is to prove that [/'I < 1 and that r ;= ±1 if, 
and only if^ all sample points lie on a straight line. 

Let ^ and == Then r will assume the forlp 

(1) , , ,i,: :/ 

In order to avoid trivial cases, which can easily be treated separately, it 
will be assumed that the a;’s are not all equal and that the ^’s are not all 
equal. This assumption prevents the denominator in (1) froni having 
the value zero. 

Now consider the inequality 

(2) i;(2a, + > 0 , 

where 2 is any real number. Since the left side is the sum of only squared 
terms, this inequality will be satisfied for all values of 2 : if, and only if, 
a number Zq does not exist such that j 

(3) 0, z = 1, 2, ' • • , « 

Assume for the present that no such number exists. Then squaring and 
summing in (2) will produce the inequality 

(4) + 222«A + > 0 

The left side is a quadratic function in which is everywhere positive; 
consequently the coiresponding quadratic equation must have iniaginary 
roots, which in turn implies that the discriminant of the quadratic must 
be negative. Thus it is necessary that 

i22aAf-Mla,^)Qb,^)<0 

or that 

' (la I 
In view of (1), this shows that < 1, provided that a number Zq satisfying 
(3) does not exist. ii 


375 



376 INTRODUCTION TO MATHEMATICAL STATISTICS 

Now suppose a number Zq does exist such that (3) holds. Then 

= 0 

Squaring and summing will yield 

(5) = 0 

This says that is a root of the corresponding quadratic equation. Since 
there obviously cannot be two different values of Zq satisfying (3), Zq must 
be a double root in (5). As a result, the discriminant will be equal to zero ; 
hence 

In view of (1), this shows that if a number Zq satisfying (3) exists, then 
= 1. Since it was shown earlier that < 1 unless such a number did 
exist, it follows that = 1 if, and only if, a number Zq satisfying (3) exists. 
In terms of the original variables, (3) can be written in the form 

yi-y= -x), i = 1, 2, • • • , H 

In geometrical language, it therefore follows that < 1 and thatr^ = 1 if, 
and only if, the points lie on some straight line. 

Students who have seen the inequality of Schwartz will recognize that 
< 1 is merely a version of that inequality. 


2. Likelihood Ratio Test for Goodness of Fit 

k 

Consider k cells with probabilities ^ ,P]cy where ^Pt^ 1. 

h 1 

Let Wi, * ’ * 5 with be the observed frequencies in those 

1 

cells in n trials. Then the likelihood function is 
(1) Lip) 

Since = 1, there are only k — \ independent parameters here; con- 
sequently in maximizing L{p) by calculus methods it is necessary to keep 
this fact in mind. In this connection, it is convenient to choose pj^ as the 
parameter to be expressed in terms of the remaining parameters. Taking 
logarithms and differentiating with respect to Pi will yield 

log Lip) = Ui log Pi + ^2 log P 2 + • " + Wfc log Pic 
d log Ljp) 

dpi Pi pj, dPi Pi pj, 



appendix! 377 

For a maximum it is necessary that all A: — 1 partial derivatives vanish; 
hence it is necessary that J; 

(2) = f = 

Pi Pk 

If the maximum likelihood estimate of is denoted by it follows from 

(2) that 5; 

(3) Pi = ^n^, i = 

Since these estimates must satisfy the restriction 2/,. = 1, summing both 
sides of (3) will yield 

If this result is applied to (3), it will follow that 

(4) A = -^ i = 1,2, •••,/. 

Now consider the likelihood ratio test for testing the hypothesis 
Ho-Pi=pio, i-l,2,---,k ,, 

Since there are no unspecified parameters remaining when is true, it 
follows from (1) and (4) that the likelihood ratio here is given by^' 

^ ' ' PkO^^ 

_ /«Piop/"P2o\”'‘ . . . ( nPko 
V «! / \ ng / \ ha 

== • • • W' 

\nj IfZg/ 

where = npiQ, As a result 

(5) -21ogA = -2'|;n,log-^ 



378 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Now let -- which is the difference between the observed 

frequency and the expected frequency in the ith cell. Then (5) may be 
expressed in the following form. 


( 6 ) 


-2 log X= -2 2 (^» + log 






^4 


= 2 2 (*i + log J j 

= 22{x, + e,)\^-U^j+U^ 

le,: 2\eJ 3\ei, 






The variable is a binomial variable with mean (jl^ = npi^ = and 
variance = npi^il — Pi^ = eji\ — p^^\ consequently the variable 
xje^ may be expressed in the form 

(7) ^ ^ - f^i A - Pio 

ei Gi V npio 


Now from Theorem 2, Chapter 5, the variable (n^ — has a distri- 
bution approaching that of a standard normal variable as -> co, whereas 
the square root factor in (7) approaches zero at the same rate as IjVn, 
Thus, for large n, xje^ will almost certainly be very small and of the order 
of 1/Vw; consequently the successive terms in the above expansion will 
be of order IjVn times the preceding term. As a result, the large sample 
approximate value of — 21ogi^ is given by the first term in (6). Thus 


-2 log A 


V 'i- 

~ 2 — 

e. 


^ (n, - 

i = l Ci 


Since from (12), Chapter 9, it is known that —2 log A possesses an approxi- 
mate distribution, this derivation shows that the quantity 
possesses an approximate distribution. With a little more attention 
to details the preceding derivation can be made to yield a mathematical 
theorem, which essentially states in the language of limiting distributions 
what has been said here concerning approximate distributions. 



APPENDIX 1 


379 

3. Cramer-Rao Inequality 

Consider the problem of how to find the best unbiased estimate of the 
parameter 6 in the continuous frequency function f{x; 6). The solution 
of the problem lies in obtaining an inequality for the variance of any 
unbiased estimator t == tix^, ^ 2 ? * ’ ‘ This inequality is derived 

in the following manner. 

Since ^ 1 , 3:^2, * • * , is a random sample from f{x\ d), its frequency 
function, which for brevity of notation is denoted by L, is given by 

It therefore follows that 


( 1 ) 



L dx^ • • * dx 


n 


1 


Since t = t(xi, X 2 , • - • , xj is assumed to be an unbiased estimator of 6, 
it follows that 


( 2 ) 


EW-f-f 


tL dxj^ * • • dx.^ = 0 


Formulas (1) and (2) are identities in d; therefore they may be dif- 
ferentiated with respect to 9. In doing so, it will be assumed that it is 
permissible to differentiate under the integral sign and that the limits of 
integration do not depend on 6, Differentiation of (1) will give ^ 


( 3 ) 



dx^ = 0 


Differentiation of (2) yields 

(4) J- ■ ■ = 1 

The value of dLjdd is most easily obtained by calculating 

81ogL 
de ~ Ldd 

Thus 


dd A dd 


To simplify the notation somewhat, let 


( 5 ) 


T= y 

A sd 



380 INTRODUCTION TO MATHEMATICAL STATISTICS 

Equation (3) can now be expressed as follows. 

(6) 0 = J- • • JtL rf*! • • ■ dx^ = £[r] 

Similarly, equation (4) will assume the form 

(7) 1 == J ■•■jtTLdx^--- dx„ = EltT} 

Next, consider the value of the correlation coefficient between the two 
random variables t and T, From formula (13), Chapter 8, it may be 
written as 

_ £[rT] - E[t]E[T] 

PtT 

In view of the results in (6) and (7), this will reduce to 

( 8 ) ^ 


PtT — 


CfO. 


t^T 


Since any correlation coefficient satisfies the inequality ^ 1, it follows 
from (8) that and cr^ must satisfy the inequality 

1 


( 9 ) 


o?> 


Or£ 


In view of (5) and the independence of the terms in that sum, it follows 
that 


( 10 ) 


ar^ — 2 


where o*/ is the variance of 3 log/(a::^ ; e)/36. But from (5) and (6) 

^ ^ 31og/(a:,;e) ^Q 
dd 

Since the possess the same distribution, the quantities d logf(x^; 0)/36, 
/=1,2, ••*,« must possess the same distribution, hence the same 
expected value. Since the sum of such expected values is zero, it follows 
that each expected value must be zero and therefore the variance 
of d logf(x^; 6)ld6 is equal to its second moment. Thus 


= E 


Consequently, from (10), 


= nE 


'd 0) 

' dd 

d \ogf{x\ 6) 
dd 



APPENDIX 1 


381 


because each has the same distribution as the basic variable x. When 
this result is substituted in (9), one will obtain the desired inequaHty, 
namely 


de J 

Since a best unbiased estimate is by definition one with niinimum 
variance, it follows that if one can find an unbiased estimat4 whose 
variance is equal to the quantity on the right of (11) he will have found a 
best unbiased estimate. 

This formula can be used to show that ^ is a best unbiased estimate 
for the mean of a normal distribution. Towhrd this end, write 






and assume that the value of (t is known. Then 

\og f {x-, fj) = — ^ 


Hence 


9 lQg/(x;/i) _ X — 
dfJ, O’® 

'9 log /(a: ;^) 1® 1 


“|2 1 1 
= ^£(x-/.)® = i 
A 


Substituting this result in (11) will yield 


But it is known that o*/ = a^ln; therefore must be a best Unbiased 
estimate of ^ for a normal distribution. 


4. Transformations and Jacobians 

Geometrically, the functions u = u(x, y) and v = v{x^ y) represent a 
transformation from the coordinate system x^y to the coordinate system 
u,i\ Now there exists a calculus formula that enables one to evaluate the 
integral of the function f{x, y) over a region R in the x,y plane by nibans of 



382 


INTRODUCTION TO MATHEMATICAL STATISTICS 


the proper integral over a corresponding region R' in the u,v plane. This 
formula is 

(1) JJ f(x, y) dx dy y) |j| du dv 

R R' 

where the quantity J, called the Jacobian of the transformation, is given 
by the formula 


du 

du 

dx 

dy 

dv 

dv 

dx 

dy 


The region of integration R' on the right is the region in the u,v plane 
that corresponds to the region R in the x,y plane. It is understood that the 
variables x and y in the right integrand of (1) are to be replaced by their 
values in terms of u and v by solving the relations u = u{x^ y) and v = 
v{x, y) for X and y. It will be assumed that these functions are such that 
each point in the x,y plane corresponds to exactly one point in the u^v 
plane, and conversely. 

The integral on the left of (1) yields the probability that the sample 
point x^y will lie in the region R, Because of the one-to-one correspond- 
ence between points in the two coordinate systems, this can occur if, and 
only if, the sample point u^v lies in the corresponding region R!\ con- 
sequently the integral on the right must yield the probability that the 
sample point u,v will lie in the region R!, Now formula (1) holds for all 
possible regions R in the x^y plane, hence for all possible regions R! in 
the UyV plane; consequently the integrand in the integral on the right side 
of (1) must be the frequency function of the random variables u and v. 
Thus, denoting this function by g{u, v), it follows that 
(3) g(u,v) =f(x,y)\J\ 


where J is given by (2). 

As an illustration, consider the problem solved earlier in (2), Chapter 
11. Here one may choose u = z = t(x, y) and v = x. Then (2) becomes 


dz 

1 = 

^ 1 

Application of (3) then yields 

g(u, v) 


dy 

0 

^ /(*. y) 

dz 


dz 

dy 


dy 


which is the result given earlier. 



APPENDIX 1 


383 


The method that has just been explained for finding the fr^uency 
function of two transformed variables w and K eSn be generalized! to any 
number of variables. The formula that results.^^^^^ of 

probability considerations applied to the formula for evaluating a ihultiple 
integral by means of a new coordinate system. The folldwing theorem, 
in which the functions are assumed to satisfy certain regularity condi- 
tions, yields the desired general result for k variables. 

Theorem: If the continuous variables rcg? * ’ Vj Xy,possess the frequency 
function ojg, • * * , and the transformed variables ujfi, 

• ' * ) ^ = 1> 2, • • • , /: yield a one-to-one transformation of the two 

coordinate systems, the frequency function of the u^s will be giveh by the 
formula ; 

(4) g{u^, Ma, • • • , Mj) =/(*!, *2, • • • , X^)\J\ 

where 

dui 

dx^ 

du, ^ 

A-i dx^ 

and where the x*s on the right of (A) are to be replaced by their vqlues in 
terms of the u^s by solving the relations u^ = u^ix^, X 2 , • * • , Xj^) for thex^s, 

5. Independence of x md for Normal llistribiitions 

Consider the n independent normal variables Xj^, ^^ 2 ? ’ * * ? with means 
1 ^ 2 , * * * , M'n the common variance cr^. Their joint frequency func- 
tion is given by 

( 1 ) • ‘ ‘ = 

Now 'Zix^ — =: is the equation of a sphere in n dimensions with 

center at the point (/^i, jWg, * • * , and with radius c. As a consequence, 
the geometrical interpretation of (1) is that the probability density is 
constant on the surface of any sphere with center at (^ 1 , • * • , /i J and 

the magnitude of the density for any point on such a sphere is given by 
replacing Ti(x^ — in (1) by the square of the radius of the 'Sphere. 
These two geometrical properties completely determine the distribution of 



384 


INTRODUCTION TO MATHEMATICAL STATISTICS 




Fig. 1. Distribution of n independent normal variables. 

Xi, ‘ ‘ * j ^n- A sketch illustrating these properties is shown in Fig. 1 
where x and ^ denote the sample point {x^, x^, - • • , x^ and the mean 
point (/^i, jWg, • * • , ^ J, respectively. 

Now suppose one rotates the axes of this coordinate system in any 
desired manner. If the new axes are denoted by 2/2* ‘ ’ * j Vn^ indi- 
cated in Fig. 1 , the equation of the sphere sketched there will become 
2(^2 — where (vi, V2, ■ * • , v^) denotes the coordinates of the 

mean point (x in terms of the new coordinate system. The typical sample 
point X = ajg, • • * , x^ becomes the point y = (2/1, 2/2^ ’ * ‘ > 2 /w) l^e 
new system. The 2/’s are random variables because they are functions of 
the random variables • • • , x^. Since the only effect of rotating 

axes is to change the coordinates of the mean of the distribution, the 
geometrical properties of the distribution being considered show that 
the distribution of the new variables y^, 2/2» ’ * ' > 2 /n ^^st be given by the 
frequency function 


1 

( 2 ) giyx, Vz,--- , Vn) = 

(2na^f 

The preceding discussion will now be specialized to the case in which 
‘ and in which one rotates the axes in such a way that 


the yi axis becomes the line that makes equal angles with the positive x 
axes, as shown in Fig. 2 . 

In the new coordinate system the mean point y. will be on the y^ axis, 
and therefore its coordinates with respect to the new axes are given by 
()^, 0, • • • , 0), where v is the distance from the origin to the point y in 
the old coordinate system. It is clear from Figs. 1 and 2 that this distance 
is given hy v ^ y^ + y2 +'' * + yn — ^5 where y here denotes 




APPENDIX 1 


385 


the common numerical value of the equal-valued As a result, formula 
(2) may be applied to give 


0) giVl^ . Vn) 

(27ra^) 

Thus the y"s are independent normal variables with a common variance 
cr^, and all have zero means except yi, which has a mean of /mV n. 

Now consider the geometrical meaning of the equation defining the 
sample mean, namely 

( 4 ) x^ + + ^ + x^ — rix 

This is the equation of a plane in n dimensions. Since the coefficients of 
the variables x^^x^, • • • , are direction numbers of a normal (perpen- 
dicular) to the plane and since all these coefficients are equal, it follows 
that the y-^ axis is a normal, hence perpendicular, to this plane because 
the y^ axis makes equal angles with the positive x axes. Further, since 
the coordinates of the point (x, x, * • • , x) satisfy equation ( 4 ) and since 
this point lies on the 2/1 axis, it follows that the point of intersection of 
this plane with the t/i axis is the point (x, x, • • • , ^), which has been 
labeled x in Fig. 2 . The point x lies in this plane also. 

Now ns^ = — x)^ is the square of the distance between the points 

labeled a; and ^ in Fig. 2. In the new coordinate system the square of 
this distance is given by 2/2^ + 2/3^ + * * * + y j' because the points x 
and X possess the coordinates (2/1, 2/2? * * * > 2/w) and (2/1, 0, • • • , 0), respec- 
tively, in the new coordinate system. Thus 

ns^=yi + 2/3^ + * * • + 2/nV 


Xn 



\ 

Fig. 2. Transformation of coordinates. 



386 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Furthermore, since the distance from the origin to the plane given by (4) 
is V nx and y^, respectively, in the two coordinate systems, it follows that 



This shows that ^ is a function of the variable only and that is a 
function of the variables 2 / 2 > 2/3» ‘ ‘ only. Since the y's are independ- 
ent random variables, it therefore follows that x and must be inde- 
pendent random variables. 



APPENDIX 2 

Tables 


387 




Table I. Squares and Square Roots 


N 

N * 


■ v/IoN 

N 

N2 

Vn 

V : i 0 N 

1.00 

1.0000 

1.00000 

3.16228 

1.50 

2.2500 

1.22474 

3.87298 

1.01 

1.02 

1.03 

1.0201 

1.0404 

1.0609 

1.00499 

1.00995 

1.01489 

3.17805 

3.19374 

3.20936 

1.51 

1.52 

1.53 

2.2801 

2,3104 

2.3409 

1.22882 

1.23288 

1.23693 

5.88587 

3.89872 

3.$!1162 

1.04 

1.05 

1.06 

1.0816 

1,1025 

1.1236 

1.01980 

1.02470 

1.02956 

3.22490 

3.24037 

3.25576 

1.54 

1.55 

1.56 

2.3716 

2.4025 

2.4336 

1.24097 

1.24499 

1.24900 

3.^2428 

3.93700 

3.94968 

1.07 

l.OS 

1.09 

1.1449 

1.1664 

1.1881 

1.03441 

1.03923 

1.04403 

3.27109 

3.28634 

3.30151 

1.57 

1.58 

1.59 

2.4649 

2.4964 

2.5281 

1.25300 

1.25698 

1.26095 

5.96232 

3.97492 

3.98748 

1.10 

1.2100 

1.04881 

3.31662 

1.60 

2.5600 

1.26491 

4.00000 

1.11 

1.12 

1.13 

1.2321 

1.2544 

1.2769 

1.05357 

1.05830 

1,06501 

3.33167 

3.34664 

3.36155 

1.61 

1.62 

1.63 

2.5921 

2.6244 

2.6569 

1.26886 

1,27279 

1.27671 

4.01248 

4.02492 

4.03733 

1.14 

1.15 
146 

1.2996 

1.3225 

1.3456 

1.06771 

1.07238 

1.07703 

3.37639 

3.39116 

3.40588 

1.64 

1.65 

1.66 

2.6896 

2.7225 

2.7556 

1.28062 

1.28452 

1.28841 

4.04969 

4.06202 

4.07431 

1.17 

1.18 
1.19 

1,3689 

1.3924 

1,4161 

1.08167 

1.08628 

1.09087 

3.42053 

3.43511 

3.44964 

1.67 

1.68 
1.69 

2.7889 

2.8224 

2.8561 

1.29228 

1.29615 

1.30000 

4.08656 

4.09878 

4.11096 

1.20 

1.4400 

1.09545 

3.46410 

1.70 

2.8900 

1.30384 

4. i 2511 

1.21 

1.22 

1.23 

1.4641 

1.4884 

1.5129 

1.10000 

1.10454 

1.10905 

3.47851 

3.49285 

3.50714 

1.71 

1.72 

1.73 

2.9241 

2.9584 

2.9929 

1.30767 

1.31149 

1.31529 

4. f 3621 

4.14729 

4.15933 

1.24 

1.25 

1.26 

1.6376 

1.5625 

1.5876 

1.11355 

1.11803 

1.12250 

3.52136 

3.53553 

3.54965 

1.74 

1.75 

1.76 

3.0276 

3.0625 

3.0976 

1.31909 

1.32288 

1.32665 

4,17135 

4. i ;8330 

4,19524 

1.27 

1.28 
1.29 

1.6129 

1.6384 

1.6641 

1.12694 

1.13137 

1.13578 

3.56371 

3.57771 

3.59166 

1.77 

1.78 

1.79 

3.1329 

3.1684 

3.2041 

1.33041 

1.33417 

1.33791 

4.20714 

4.21900 

4.23084 

1.30 

1.6900 

1.14018 

3.60555 

1.80 

3.2400 

1.34164 

4.24264 

1.31 

1.32 

1.33 

1.7161 

1.7424 

1.7689 

1.14455 

1.14891 

1.15326 

3.61939 

3.63318 

3.64692 

1.81 

1,82 

1.83 

3.2761 

3.3124 

3.3489 

1.34536 

1.34907 

1.35277 

4.25441 

4.26615 

4.27785 

1.34 

1.35 

1.36 

1.7956 

1.8226 

1.8496 

1.15758 

1.16190 

1.16619 

3.66060 

3.67423 

3.68782 

1.84 

1.85 

1.86 

3.3856 

3.4225 

3.4596 

1.35647 

1.36015 

1.36382 

4.28952 

4.30116 

4.31277 

1.37 

1.38 

1.39 

1.8769 

1.9044 

1.9321 

1.17047 

1.17473 

1.17898 

3.70135 

3.71484 

3.72827 

1.87 

1.88 
1.89 

3.4969 

3.5344 

3.5721 

1.36748 

1.37113 

1.37477 

4.32435 

4.33590 

4.3^741 

1.40 

1.9600 

1.18322 

3.74166 

1.90 

3.6100 

1,37840 

4.35890 

1.41 

1.42 

1.43 

1.9881 

2.0164 

2.0449 

1.18743 

1.19164 

1.19583 

3.75500 

3.76829 

3.78153 

1.91 

1.92 
1.93 

3.6481 

3.6864 

3.7249 

1.38203 

1.38564 

1.38924 

4.37035 

4.38178 

4.39318 

1.44 

1.45 

1.46 

2,0736 

2.1025 

2.1316 

1.20000 

1.20416 

1.20830 

3.79473 

3.80789 

3.82099 

1.94 

1.95 

1.96 

3.7636 

3.8025 

3.8416 

1.39284 

1.39642 

1.40000 

4.40454 

4.4158$ 

4.42719 

1.47 

1.48 

1.49 

2.1609 

2.1904 

2.2201 

1.21244 

1.21655 

1.22066 

3.83406 

3.84708 

3.86005 

1.97 

1.98 

1.99 

3.8809 

3.9204 

3.9601 

1.40357 

1.40712 

1.41067 

4.43847 

4.44972 

4.46094 

1.50 

2.2500 

1.22474 

3.87298 

2.00 

4.0000 

1.41421 

4.42214 

N 

N * 

\/n 

VioN 

N 

N 2 

Vn 

vm 


^89 



Squares and Square Roots (Continued) 


N 

N 2 

Vn 

VioN 

2.00 

4.0000 

1.41421 

4.47214 

2.01 

2.02 

2.03 

4.0401 

4.0804 

4.1209 

1.41774 

1.42127 

1.42478 

4.48330 

4.49444 

4.50555 

2.04 

2.05 

2.06 

4.1616 

4.2025 

4.2436 

1.42829 

1.43178 

1.43527 

4.51664 

4.52769 

4.53872 

2.07 

2.08 
2.09 

4.2849 

4.3264 

4.3681 

1.43875 

1.44222 

1.44568 

4.54973 

4.56070 

4.57165 

2.10 

4.4100 

1.44914 

4.58258 

2.11 

2.12 

2.13 

4.4521 

4.4944 

4.5369 

1.45258 

1.45602 

1.45945 

4.59347 

4.60435 

4.61519 

2.14 

2.15 

2.16 

4.5796 

4.6225 

4.6656 

1.46287 

1.46629 

1.46969 

4.62601 

4.63681 

4.64758 

2.17 

2.18 
2.19 

4.7089 

4.7524 

4.7961 

1.47309 

1.47648 

1.47986 

4.65833 

4.66905 

4.67974 

2.20 

4.8400 

1,48324 

4.69042 

2.21 

2.22 

2.23 

4.8841 

4.9284 

4.9729 

1.48661 

1.48997 

1,49332 

4.70106 

4.71169 

4.72229 

2.24 

2.25 

2.26 

5.0176 

5.0625 

5.1076 

1.49666 

1.50000 

1.50333 

4.73286 

4.74342 

4,75395 

2.27 

2.28 
2.29 

5.1529 

5.1984 

5.2441 

1.50665 

1.50997 

1.51327 

4.76445 

4.77493 

4.78539 

2.30 

5.2900 

1.51658 

4.79583 

2.31 

2.32 

2.33 

5.3361 

5.3824 

5.4289 

1.51987 

1.52315 

1.52643 

4.80625 

4.81664 

4,82701 

2.34 

2.35 

2.36 

5.4756 

5.5225 

5.5696 

1.52971 

1.53297 

1.53623 

4.83735 

4.84768 

4,85798 

2.37 

2.38 

2.39 

5.6169 

5.6644 

5.7121 

1.53948 

1,54272 

1.54596 

4.86826 

4.87852 

4.88876 

2.40 

5.7600 

1.54919 

4.89898 

2.41 

2.42 

2.43 

5.8081 

5.8564 

5.9049 

1.55242 

1.55563 

1.55885 

4.90918 

4.91935 

4.92950 

2.44 

2.45 

2.46 

5.9536 

6.0025 

6.0516 

1.56205 

1.56525 

1.56844 

4.93964 

4.94975 

4.95984 

2.47 

2.48 

2.49 

6.1009 

6.1504 

6.2001 

1.57162 

1.57480 

1.57797 

4.96991 

4.97996 

4.98999 

2.50 

6.2500 

1.58114 

5.00000 

N 

N * 

Vn 

Viw 


N 

N2 

Vn 

VlON 

2.50 

6.2500 

1.58114 

5.00000 

2.51 

2.52 

2.53 

6.3001 

6.3504 

6.4009 

1.58430 

1,58745 

1.59060 

5.00999 

5.01996 

5.02991 

2.54 

2.55 

2.56 

6.4516 

6.5025 

6.5536 

1.59374 

1.59687 

1.60000 

5.03984 

5.04975 

5.05964 

2.57 

2.58 

2.59 

6,6049 

6,6564 

6.7081 

1.60312 

1.60624 

1.60935 

5.06952 

5.07937 

5,08920 

2.60 

6.7600 

1.61245 

5.09902 

2.61 

2.62 

2.63 

6.8121 

6.8644 

6.9169 

1.61555 

1.61864 

1.62173 

5.10882 

5.11859 

5.12835 

2.64 

2.65 

2.66 

6.9696 

7.0225 

7.0756 

1.62481 

1.62788 

1.63095 

5.13809 

5.14782 

5.15752 

2.67 

2.68 
2.69 

7.1289 

7.1824 

7.2361 

1.63401 

1.63707 

1.64012 

5.16720 

5.17687 

5.18652 

2.70 

7.2900 

1.64317 

5.19615 

2.71 

2.72 

2.73 

7.3441 

7.3984 

7.4529 

1.64621 

1.64924 

1.65227 

5.20577 

5.21536 

5.22494 

2.74 

2.75 

2.76 

7.5076 

7.5625 

7.6176 

1.65529 

1.65831 

1.66132 

5.23450 

5.24404 

5.25357 

2.77 

2.78 

2.79 

7.6729 

7,7284 

7.7841 

1.66433 

1.66733 

1.67033 

5.26308 

5.27257 

5.28205 

2.80 

7.8400 

1.67332 

5.29150 

2.81 

2.82 

2.83 

7.8961 

7.9524 

8.0089 

1.67631 

1.67929 

1.68226 

5.30094 

5.31037 

5.31977 

2.84 

2.85 

2.86 

8.0656 

8.1225 

8.1796 

1.68523 

1,68819 

1,69115 

5.32917 

5.33854 

5.34790 

2.87 

2.88 
2.89 

8.2369 
8.2944 
8.3521 ; 

1.69411 

1.69706 

1.70000 

5.35724 

5.36656 

5.37587 

2.90 

8.4100 i 

1.70294 

5.38516 

2.91 

2.92 

2.93 

8.4681 

8.5264 

8.5849 

1.70587 

1.70880 

1.71172 

5.39444 

5.40370 

5.41295 

2.94 

2.95 

2.96 

8,6436 

8.7025 

8.7616 

1.71464 

1.71756 

1.72047 

5.42218 

5.43139 

5.44059 

2.97 

2.98 

2.99 

8.8209 

8.8804 

8.9401 

1.72337 

1.72627 

1.72916 

5.44977 

5.45894 

5.46809 

3.00 

9.0000 

1.73205 

5.47723 

N 

N2 

\/n 

VioN 


390 



Squares and Square Roots {Continued) 


K 

N2 

\/n 

vTon 

3.00 

9.0000 

1.73205 

5.47723 

3.01 

3.02 

3.03 

9.0601 

9.1204 

9.1809 

1.73494 

1.73781 

1.74069 

5.48635 

6.49545 

5.50454 

3.04 

3.05 

3.06 

9.2416 

9.3025 

9.3636 

1.74356 

1.74642 

1.74929 

5.51362 

5.52268 

6.53173 

3.07 

3.08 

3.09 

9.4249 

9.4864 

9.5481 

1.75214 

1,75499 

1.75784 

5.54076 

5.54977 

5,55878 

3.10 

9.6100 

1.76068 

5.56776 

3.11 

3.12 

3.13 

9,6721 

9.7344 

9.7969 

1.76352 

1.76635 

1,76918 

5.57674 

5.58570 

5,59464 

3.14 

3.15 

3.16 

9.8596 

9.9225 

9.9856 

1.77200 

1.77482 

1,77764 

5.60357 

5,61249 

5.62139 

3.17 

3.18 

3.19 

10.0489 

10.1124 

10.1761 

1.78045 

1.78326 

1.78606 

6.63028 

5.63915 

5.64801 

3.20 

10.2400 

1.78885 

5.65685 

3.21 

3.22 

3.23 

10.3041 

10.3684 

10.4329 

1,79165 

1.79444 

1.79722 

5.66569 

5.67450 

5.68331 

3.24 

3.25 

3.26 

10.4976 

10.5625 

10.6276 

1.80000 

1.80278 

1.80555 

5.69210 

5.70088 

5.70964 

3.27 

3.28 

3.29 

10.6929 

10.7584 

10.8241 

1.80831 

1.81108 

1.81384 

5.71839 

5.72713 

5.73585 

3.30 

10.8900 

1.81659 

5.74456 

3.31 

3.32 

3.33 

10.9561 

11,0224 

11.0889 

1,81934 

1.82209 

1.82483 

5.75326 

5.76194 

6.77062 

3.34 

3.35 

3.36 

11.1556 

11.2225 

11.2896 

1.82757 

1.83030 

1.83303 

5.77927 

5.78792 

5.79655 

3.37 

3.38 

3.39 

11.3569 

11.4244 

11.4921 

1.83576 

1.83848 

1,84120 

5.80517 

5.81378 

5.82237 

3.40 

11.5600 

1.84391 

5.83095 

3.41 

3.42 

3.43 

11.6281 

11.6964 

11.7649 

1,84662 

1.84932 

1.85203 

5.83952 

5.84808 

5.85662 

3.44 

3.45 

3.46 

11.8336 

11.9025 

11.9716 

1.85472 

1.85742 

1.86011 

5.86515 

5.87367 

5.88218 

3.47 

3.48 

3.49 

12.0409 

12.1104 

12.1801 

1.86279 

1.86548 

1.86815 

5.89067 

5.89915 

5.90762 

8,50 

12.2500 

1.87083 

5.91608 

N 

N* 

Vn 

vTon 


N 

N* 

Vn 

Vim 

3.50 

12.2500 

1.87085 

S.9160S 

3.51 

3.52 

3.53 

12.3201 

12.3904 

12.4609 

1.87350 

1.87617 

1.87883 

5.92455 

5.93296 

5.94138 

3.54 

3.55 

3.56 

12.5316 

12.6025 

12.6736 

1,88149 

1.88414 

1.88680 

5.94979 

5.95819 

5.96657 

3.57 

3.58 

3.59 

12.7449 

12.8164 

12.8881 

1.88944 

1.89209 

1.89473 

5.97495 

5.98331 

5.99166 

3.60 

12.9600 

1.89737 

6.00000 

3.61 

3.62 

3.63 

13.0321 

13.1044 

13.1769 

1.90000 

1.90263 

1.90S26 

6.00833 

6.01664 

6.02495 

3.64 

3.65 

3.66 

13.2496 

13.3225 

13.3956 

1.90788 

1.91050 

1.91311 

6.03324 

6.04152 

6.04979 

3.67 

3.68 

3.69 

13.4689 

13.5424 

13.6161 

1.91572 

1.91833 

1.92094 

6.05805 

6.06630 

6.07454 

3.70 

13.6900 

1.92354 

6.08276 

3.71 

3.72 

3.73 

13.7641 

13.8384 

13,9129 

1.92614 

1.92873 

1.93132 

6.09098 

6.09918 

6.10737 

3.74 

3.75 

3.76 

13.9876 

14.0625 

14.1376 

1.93391 

1.93649 

1.93907 

6.11555 

6.12372 

6,13188 

3.77 

3.78 

3.79 

14.2129 

14.2884 

14.3641 

1.94165 

1.94422 

1.94679 

6,14003 

6.14817 

6.1$630 

3.80 

14.4400 

1.94936 

6.16441 

3.81 

3.82 

3.83 

14.5161 
14.5924 
14.6689 ; 

1.95192 

1.95448 

1.95704 

6.17252 

6.18061 

6.18870 

3.84 

3.85 

3.86 

14.7456 ^ 

14.8225 

14.8996 

1.95959 

1.96214 

1.96469 

6.19677 

6.20484 

6.21289 

3.87 

3.88 

3.89 

14.9769 

15.0544 

16.1321 

1.96723 

1.96977 

1.97231 

6.22093 

6.22896 

6.23699 

3.00 

15.2100 

1.97484 

6.24500 

3.91 

3.92 
3.95 

15.2881 

15.3664 

15.4449 

1.97737 

1.97990 

1.98242 

6.25300 

6,26099 

6.26897 

3.94 

3.95 

3.96 

15.5236 

15.6025 

15.6816 

1.98494 

1.98746 

1.98997 

6.27694 

6,28490 

6.29285 

3.97 

3.98 

3.99 

15.7609 

15.8404 

15.9201 

1.99249 

1.99499 

1.99750 

6.30079 

6.30872 

6.31664 

4.00 

16.0000 

2.00000 

6.32456 

N 

N® 

Vn 

VfoN 


391 



Squares and Square Roots (Continued) 


N 

N 2 

Vn 

VWN 

4.00 

16.0000 

2.00000 

6.32456 

4.01 

4.02 

4.03 

16.0801 

16,1604 

16.2409 

2.00250 

2.00499 

2.00749 

6.33246 

6.34035 

6.34823 

4.04 

4.05 

4.06 

16.3216 

16.4025 

16.4836 

2.00998 

2.01246 

2.01494 

6.35610 

6.36396 

6.37181 

4.07 

4.08 

4.09 

16.5649 

16.6464 

16.7281 

2.01742 

2.01990 

2.02237 

6.37966 

6.38749 

6.39531 

4.10 

16.8100 

2.02485 

6.40312 

4.11 

4.12 

4.13 

16.8921 

16.9744 

17.0569 

2.02731 

2.02978 

2.03224 

6.41093 

6.41872 

6.42651 

4.14 

4.15 

4.16 

17.1396 

17.2225 

17,3056 

2.03470 

2.03715 

2.03961 

6.43428 

6.44205 

6.44981 

4.17 

4.18 

4.19 

17.3889 

17.4724 

17.5561 

2.04206 

2.04450 

2.04695 

6.45755 

6.46529 

6.47302 

4.20 

17.6400 

2.04939 

6.48074 

4.21 

4.22 

4.23 

17.7241 

17.8084 

17.8929 

2.05183 

2.05426 

2.05670 

6.48845 

6.49615 

6.50384 

4.24 

4.25 

4.26 

17.9776 

18.0625 

18.1476 

2.05913 

2.06155 

2.06398 

6.51153 

6.51920 

6.62687 

4.27 

4.28 

4.29 

18.2329 

18.3184 

18.4041 

2.06640 

2.06882 

2.07123 

6.53452 

6.54217 

6.54981 

4.30 

18.4900 

2.07364 

6.55744 

4.31 

4.32 

4.33 

18.5761 

18.6624 

18.7489 

2.07605 

2.07846 

2.08087 

6.56506 

6.57267 

6.58027 

4.34 

4.35 

4.36 

18.8356 

18.9225 

19.0096 

2.08327 

2.08567 

2.08806 

6.58787 

6.59545 

6.60303 

4.37 

4.38 

4.39 

19.0969 
19.1844 1 
19.2721 

2.09045 

2.09284 

2.09523 

6.61060 

6.61816 

6.62671 

4.40 

19.3600 

2.09762 

6.63325 

4.41 

4.42 

4.43 

19.4481 

19.5364 

19.6249 

2.10000 

2.10238 

2.10476 

6.64078 

6.64831 

6.65582 

4.44 ' 

4.45 

4.46 

19.7136 

19.8025 

19.8916 

2.10713 

2.10950 

2.11187 

6.66333 

6.67083 

6.67832 

4.47 

4.48 

4.49 

19.9809 

20.0704 

20.1601 

2.11424 

2.11660 

2.11896 

6.68581 

6.69328 

6.70075 

4.50 

20.2500 

2.12132 

6.70820 

N 

N * 

Vn 

ViON 


N 

N2 

Vn 

VioN 

4.50 

20.2500 

2.12132 

6.70820 

4.51 

4.52 

4.53 

20.3401 

20.4304 

20.5209 

2.12368 

2.12603 

2.12838 

6.71565 

6.72309 

6.73053 

4.54 

4.55 

4.56 

20.6116 

20.7025 

20.7936 

2.13073 

2.13307 

2.13542 

6.73795 

6.74537 

6.75278 

4.57 

4.58 

4.59 

20.8849 

20.9764 

21.0681 

2.13776 

2.14009 

2.14243 

6.76018 

6.76767 

6.77495 

4.60 

21.1600 

2.14476 

6.78233 

4.61 

4.62 
4.65 

21.2521 

21.3444 

21.4369 

2.14709 

2.14942 

2.16174 

6.78970 

6.79706 

6.80441 

4.64 

4.65 

4.66 

21.5296 

21.6225 

21.7156 

2.15407 

2.15639 

2.15870 

6.81175 

6.81909 

6.82642 

4.67 

4.68 

4.69 

21.8089 

21.9024 

21.9961 

2.16102 

2.16333 

2.16664 

6.83374 

6.84105 

6.84836 

4.70 

22.0900 

2.16795 

6.8 SS 6 S 

4.71 

4.72 

4.73 

22.1841 

22.2784 

22.3729 

2.17025 

2.17256 

2.17486 

6.86294 

6.87023 

6-87750 

4.74 

4.75 

4.76 

22.4676 

22.5625 

22.6576 

2.17715 

2.17945 

2.18174 

6.88477 

6.89202 

6.89928 

4.77 

4.78 

4.79 

22.7529 

22.8484 

22.9441 

2.18403 

2.18632 

2.18861 

6.90652 

6.9137 S 

6.92098 

4.80 

23.0400 

2.19089 

6.92820 

4.81 

4.82 

4.83 

23.1361 

23.2324 

23.3289 

2.19317 

2.19545 

2.19773 

6.93542 

6.94262 

6.94982 

4.84 

4.85 

4.86 

23.4256 

23.5225 

23.6196 

2.20000 

2.20227 

2.20454 

6.95701 

6.96419 

6.97137 

4.87 

4.88 

4.89 

23.7169 

23.8144 

23.9121 

2.20681 

2.20907 

2.21133 

6.97854 

6.98570 

6.99285 

4.90 

24.0100 

2.21359 

7.00000 

4.91 

4.92 

4.93 

24.1081 

24.2064 

24.3049 

2.21585 

2.21811 

2.22036 

7.00714 

7.01427 

7.02140 

4.94 

4.95 

4.96 

24.4036 

24.5025 

24.6016 

2.22261 

2.22486 

2.22711 

7.02851 

7.03562 

7.04273 

4.97 

4.98 

4.99 

24.7009 

24.8004 

24.9001 

2.22935 

2.23159 

2.23383 

7.04982 

7.05691 

7.06399 

5.00 

25.0000 

2.23607 

7.07107 

N 

N * 

Vn 

VIoN 


392 



Squares and Square Roots (Continued) 


N 

N* 

\/n 

VTon 

5.00 

25.0000 

2.23607 

7.07107 

6.01 

5.02 

6.03 

25.1001 

25.2004 

25.3009 

2.23830 

2.24054 

2.24277 

7.07814 

7.08520 

7.09225 

5.04 

6.05 

6.06 

25.4016 

25.5025 

25.6036 

2.24499 

2.24722 

2.24944 

7.09930 

7.10634 

7.11337 

6.07 

5.08 

6.09 

25.7049 

25.8064 

25.9081 

2.25167 

2.25389 

2.25610 

7.12039 

7.12741 

7.13442 

5.10 

26.0100 

2.25832 

7.14143 

5.11 

6.12 
5.13 

26.1121 

26.2144 

26.3169 

2.26053 

2.26274 

2.26495 

7.14843 

7.15542 

7.16240 

5.14 

5.15 

5.16 

26.4196 

26.6225 

26.6256 

2.26716 

2.26936 

2.27156 

7.16938 

7.17635 

7.18331 

5.17 

5.18 
649 

26.7289 

26.8324 

26.9361 

2.27376 

2.27596 

2.27816 

7.19027 

7.19722 

7.20417 

5.20 

27.0400 

2.28035 

7.21110 

5.21 

5.22 

5.23 

27,1441 

27.2484 

27.3529 

2.28254 

2.28473 

2.28692 

7.21803 

7.22496 

7.23187 

5.24 

5.25 
■iM 

27.4576 

27.5625 

27.6676 

2.28910 

2.29129 

2.29347 

7.23878 

7.24569 

7.25259 

6.27 

5.28 

5.29 

27.7729 

27.8784 

27.9841 

2.29565 

2.29783 

2.30000 

7.25948 

7.26636 

7.27324 

5.30 

28.0900 

2.30217 

7.28011 

6.31 

5.32 

5.33 

28.1961 

28,3024 

28.4089 

2.30434 

2.30651 

2.30868 

7,28697 

7,29383 

7.30068 

5.34 

5.35 

5.36 

28.5156 

28,6225 

28.7296 

2.31084 

2.31301 

2.31517 

7.30753 

7.31437 

7.32120 

5.37 

5.38 

5.39 

28.8369 

28.9444 

29.0521 

2.31733 

2.31948 

2.32164 

7.32803 

7.33485 

7.34166 

5.40 

29.1600 

2.32379 

7.34847 

6.41 

5.42 

6.43 

29.2681 

29.3764 

29.4849 

2.32594 

2.32809 

2.33024 

7.35527 

7.36206 

7.36885 

5.44 

6.45 

5.46 

29.5936 

29.7025 

29.8116 

2.33238 

2.33452 

2.33666 

7.37564 

7.38241 

7.38918 

5.47 

6.48 

6.49 

29.9209 

30.0304 

30.1401 

2.33880 

2.34094 

2.34307 

7.39594 

7.40270 

7.40945 

5.50 

30.2500 

2.34521 

7.41620 

N 

N2 

Vn 

VlON 


N 

N2 


v^'ioN 

5^ 

30.2500 

2.34521 

7.^1620 

5.51 

6.52 

5.53 

30.3601 

30.4704 

30.5809 

2.34734 

2.34947 

2.35160 

7.42294 

7.42967 

7.43640 

5.54 

5.55 

5.56 

30.6916 

30.8025 

30.9136 

2.35372 

2.35584 

2.35797 

7.44312 

7.44983 

7.45654 

5.57 

5.58 

5.59 

31.0249 

31.1364 

31.2481 

2.36008 

2.36220 

2.36432 

7,^6324 

7.46994 

7.^663 

5.60 

31.3600 

2.36643 

7.48531 

5.61 

5.62 

5.63 

31.4721 

31.5844 

31.6969 

2.36854 

2.37065 

2.37276 

7.ij8999 

7.49667 

7.50333 

5.64 

5.65 

5.66 

31.8096 

31.9225 

32.0356 

2.37487 

2.37697 

2.37908 

7.50999 

7.51665 

7.52330 

5.67 

5.68 

5.69 

32.1489 

32.2624 

32.3761 

2.58118 

2.38328 

2.38537 

7.52994 

7.53658 

7.54321 

6.70 

52.4900 

2.38747 

7.54983 

5.71 

5.72 

5.73 

32.6041 

32.7184 

32.8329 

2.38956 

2.39165 

2.39374 

7.65645 

7.56307 

7.66968 

5.74 

5.75 

5.76 

32.9476 

33.0625 

33.1776 

2.39583 

2.39792 

2.40000 

7.57628 

7.58288 

7.58947 

5.77 

5.78 

6.79 

33.2929 

33.4084 

33.5241 

2.40208 

2.40416 

2.40624 

7.59605 

7.60263 

7.60920 

5.80 

33.6400 

2.40832 

7.51577 

5.81 

5.82 

5.83 

53.7561 

33.8724 

33.9889 

2.41039 

2.41247 

2.41454 

7.62234 

7.62889 

7.63544 

5.84 

5.85 

5.86 

34.1056 

34.2225 

34.3396 

2.41661 

2.41868 

2.42074 

iMm 

7.64853 

7.66506 

5.87 

5.88 

5.89 

34.4569 

34.5744 

34.6921 

2.42281 

2.42487 

2.42693 

7.66159 

7.66812 

7.67463 

5.90 

34.8100 

2.42899 

7.68116 

5.91 

5.92 

5.93 

34.9281 

35.0464 

35.1649 

2.45105 

2.43311 

2.43516 

7.68765 

7.69415 

7.70065 

5.94 

5.95 

5.96 

35.2836 

35.4025 

35.5216 

2.43721 

2.43926 

2.44131 

7.1^0714 

7.71362 

7.72010 

5.97 

6.98 

6.99 

35.6409 

35.7604 

35.8801 

2.44336 

2.44540 

2.44745 

7.^^2658 

7.73305 

7.73951 

6.00 

36.0000 

2.44949 

7.74597 

N 

N* 

Vn 

\aoN 



Squares and Square Roots (Continued) 


N 

N2 

Vn 

■ v/ioN 

6.00 

36.0000 

2.44949 i 

7,74597 

6*01 

6.02 

6.03 

36.1201 

36.2404 

36.3609 

2.45153 

2.45357 

2.45561 

7.75242 

7.75887 

7.76531 

6.04 

6.05 

6.06 

36.4816 

36.6025 

36.7236 

2.45764 

2.45967 

2.46171 

7.77174 

7.77817 

7.78460 

6.07 

6.08 
6.09 

36.8449 

36.9664 

37.0881 

2.46374 

2.46577 

2.46779 

7.79102 

7.79744 

7.80385 

6.10 

37.2100 

2.46982 

7.81025 

6.11 

6.12 

6.13 

37.3321 

37.4544 

37.5769 

2.47184 

2.47386 

2.47588 

7.81665 

7.82304 

7.82943 

6.14 

6.15 

6.16 

37.6996 

37.8225 

37.9456 

2.47790 

2.47992 

2.48193 

7.83582 

7.84219 

7.84857 

6.17 

6.18 
6,19 

38.0689 

38.1924 

38.3161 

2.48395 

2.48596 

2.48797 

7.85493 

7.86130 

7.86766 

6.20 

38.4400 

2.48998 

7.87401 

6.21 

6.22 

6.23 

38.5641 

38.6884 

38.8129 

2.49199 

2.49399 

2.49600 

7.88036 

7.88670 

7.89303' 

6.24 

6.25 

6.26 

38.9376 

39.0625 

39.1876 

2.49800 

2.50000 

2.50200 

7.89937 

7,90569 

7.91202 

6.27 

6.28 
6.29 

39.3129 

39.4384 

39.5641 

2.50400 

2.50599 

2.50799 

7.91833 
7.92465 
; 7.93096 

6.30 

39.6900 

2.50998 

7.93725 

6.31 

6.32 

6.33 

39.8161 

39.9424 

40.0689 

2.51197 

2.51396 

2.51595 

7.94355 

7.94984 

7.95613 

6.34 

6.35 

6.36 

40.1956 

40.3225 

40.4496 

2.51794 

2.51992 

2.52190 

7.96241 

7.96869 

7.97496 

6.37 

6.38 

6.39 

40.5769 

40,7044 

40.8321 

2.52389 

2.52587 

2.52784 

7.98123 

7.98749 

7.99375 

6.40 

40.9600 

2.52982 

8.00000 

6.41 

6.42 

6.43 

41.0881 

41.2164 

41.3449 

2.53180 

2.53377 

2.53574 

8.00625 

8.01249 

8.01873 

6.44 

6.45 

6.46 

41.4736 

41.6025 

41.7316 

2,53772 

2,53969 

2,54165 

8.02496 

8.03119 

8.03741 

6.47 

6.48 

6.49 

41.8609 

41.9904 

42.1201 

2.54362 

2.54558 

2.54755 

8.04363 

8.04984 

8.05605 

6 JS 0 

42.2500 

2.54951 

8.06226 

N 

N 2 

Vn 

\/ l 0 N 


N 

N2 

Vn 

VWN 

6.50 

42.2500 

2.54951 

8.06226 

6.51 

6.52 

6.53 

42.3801 

42.5104 

42.6409 

2.55147 

2.55343 

2.55539 

8.06846 

8.07465 

8.08084 

6.54 

6.55 

6.56 

42.7716 

42.9025 

43.0336 

2.55734 

2.55930 

2.56125 

8,08703 

8.09321 

8.09938 

6.57 

6.58 

6.59 

43.1649 

43.2964 

43.4281 

2.56320 

2.56515 

2.56710 

8.10555 

8.11172 

8,11788 

6.60 

43.5600 

2.56905 

8.12404 

6.61 

6.62 

6.63 

43.6921 

43.8244 

43.9569 

2.57099 

2.57294 

2.57488 

8.13019 

8.13634 

8.14248 

6.64 

6.65 

6.66 

44.0896 

44.2225 

44.3556 

2.57682 

2.57876 

2.58070 

8.14862 

8.15475 

8.16088 

6.67 

6.68 
6.69 

44.4889 

44.6224 

44.7561 

2.58263 

2.58457 

2,58650 

8.16701 

8.17313 

8.17924 

6.70 

44.8900 

2.58844 

8.18535 

6.71 

6.72 

6.73 

45.0241 

45.1584 

45.2929 

2.59037 

2.59230 

2.59422 

8.19146 

8.19756 

8.20366 

6.74 

6.75 

6.76 

45.4276 

45.5625 

45.6976 

2.59615 

2.59808 

2.60000 

8,20975 

8.21584 

8.22192 

6.77 

6.78 

6.79 

45.8329 

45.9684 

46.1041 

2.60192 

2.60384 

2.60576 

8.22800 

8.23408 

8.24015 

6.80 

46.2400 

2.60768 , 

8,24621 

6.81 

6.82 

6.85 

46.3761 

46.5124 

46.6489 

2.60960 
2,61151 
2.61343 , 

8.25227 

8.25833 

8.26438 

6.84 

6.85 

6.86 

46.7856 j 

46.9225 

47.0596 

2.61534 

2.61725 

2.61916 

8.27043 

8.27647 

8.28251 

6.87 

6.88 
6.89 

47.1969 

47.3344 

47.4721 

2.62107 

2.62298 

2.62488 

8.28855 

8.29458 

8.30060 

6.90 

47.6100 

2.62679 

8.30662 

6.91 

6.92 

6.93 

47.7481 

47.8864 

48.0249 

2.62869 

2.63059 

2.63249 

8.31264 

8.31865 

8.32466 

6.94 

6.95 

6.96 

48.1636 

48.3025 

48.4416 

2.63439 

2.63629 

2.63818 

8.33067 

8.33667 

8.34266 

6.97 

6.98 

6.99 

48.5809 

48.7204 

48.8601 

2.64008 

2.64197 

2.64386 

8.34865 

8.35464 

8.36062 

7.00 

49.0000 

2.64575 

8.36660 

N 


Vn 

VlON 


^94 



Squares and Square Roots 


N 


=-:Vn 

\/ION 

».0O 

49.0000 

2,64575 

8.36660 

7.01 

7.02 

7.03 

49.1401 

49.2804 

49.4209 

2.64764 

2.64953 

2.65141 

8.37257 

8.37854 

8.38451 

7.04 

7.05 

7.06 

49.5616 

49.7025 

49.8436 

2.65330 

2.65518 

2.65707 

8.39047 

8.39645 

8.40238 

7.07 

7.08 

7.09 

49,9849 

50.1264 

50.2681 

2.65895 

2.66083 

2.66271 

8.40833 

8.42021 

7.10 

50.4100 

2.66458 

8.42615 

7.11 

7.12 

7.13 

50.5521 

50.6944 

50.8369 

2.66646 

2.66833 

2.67021 

8.43208 

8.43801 

8.44393 

7.14 

7.15 

7.16 

50.9796 

51.1225 

51.2656 

2.67208 

2.67395 

2.67582 

8.44985 

8.45577 

8.46168 

7.17 

7.18 

7.19 

51.4089 

51.5524 

51.6961 

2.67769 

2.67955 

2.68142 

8.46759 

8.47349 

8.47939 

7.20 

51.8400 

2.68328 

8.48528 

7.21 

7.22 

7.23 

51.9841 

52.1284 

52.2729 

2.68514 

2.68701 

2.68887 

8.49117 

8,49706 

8.50294 

7.24 

7.25 

7.26 

52.4176 

52.5625 

52.7076 

2.69072 

2.69258 

2.69444 

8.50882 

8.51469 

8.52056 

7.27 

7.28 

7.29 

52.8529 

52.9984 

53.1441 

2.69629 

2.69815 

1 2.70000 

i 

8.52643 

8.53229 

8.53815 

7.30 

53.2900 

2.70185 

8.54400 

7.31 

7.32 

7.33 

53.4361 

53.5824 

53.7289 

2.70370 

2.70555 

2.70740 

8.54985 

8.55570 

8,56154 

7.34 

7.35 

7.36 

53.8756 

54.0225 

54,1696 

2.70924 

2.71109 

2.71293 

8.56738 

8.57321 

8.57904 

7.37 

7.38 

7.39 

54.3169 

54.4644 

54.6121 

2.71477 

2.71662 

2.71846 

8.58487 

8.59069 

8.59651 

7.40 

54.7600 

2.72029 

8.60233 

7.41 

7.42 

7.43 

54.9081 

65.0564 

55.2049 

2.72213 

2.72397 

2.72580 

8.60814 

8.61394 

8.61974 

7.44 

7.45 

7.46 

65.3536 

55.5025 

55.6516 

2.72764 

2.72947 

2.73130 

8.62554 

8.63134 

8.63713 

7.47 

7.48 

7.49 

55.8009 

55.9504 

56.1001 

2.73313 

2.73496 

2.73679 

8.64292 

8.64870 

8.65448 

T.50 

66.2500 

2,73861 

8.66025 

N 

N2 

Vn 

vTon 


N 

N* 

Vn 

vTon 

7.50 

56.2500 

2.73861 

8.66025 

7.51 

7.52 

7.53 

56.4001 

56.5504 

56.7009 

2.74044 

2.74226 

2,74408 

8:66603 

8.67179 

1 8.67756 

7.54 

7.55 

7.56 

56.8516 

57.0025 

67.1536 

2.74591 
2.74773 
^ 2.74955 

8:68352 

8,^907 

839483 

7.57 

7:58 

7:59 

57.3049 

57.4564 

57.6081 

! 2.75136 

1 2,75318 
2.75500 

8:70057 

8:76632 

8,71206 

7.60 

57.7600 

2.75681 

8.71780 

7.61 

7.62 

7.63 

57.9121 

58.0644 

58.2169 

2.75862 

2.76043 

2.76225 

8.72353 

8.72926 

8.73499 

7.64 

7.65 

7.66 

58.3696 

58.5225 

58.6756 

2.76405 

2.76586 

2.76767 

8.^^4071 

8.74643 

8.75214 

7.67 

7.68 

7.69 

58.8289 

58.9824 

59.1361 

2.76948 

2.77128 

2.77308 

8.?5785 

8.76356 

8.76926 

7.70 

59.2900 

2.77489 

8.77496 

7.71 

7.72 

7.73 

$9.4441 

59.5984 

59.7529 

2,77669 
2.77849 
2.78029 1 

8.78066 

8.78635 

8.79204 

7.74 

7.75 

7.76 

59.9076 

60.0625 

60.2176 

2.78209 j 

2.78388 

2.78568 

8.79773 

8.80341 

8.$O909 

7.77 

7.78 

7.79 

60.3729 

60,5284 

60.6841 

2.78747 

2.78927 

2.79106 

8.81476 

8.82043 

8.82610 

7.80 

60.8400 

2.79285 

8.|3176 

7.81 

7.82 

7.83 

60.9961 

61.1524 

61.3089 

2.79464 

2.79643 

2.79821 

8.83742 

8.84308 

8.84873 

7.84 

7.85 

7.86 

61.4656 

61.6225 

61.7796 

2.80000 

2.80179 

2.80357 

8.85438 

8.86002 

8.86566 

7.87 

7.88 

7.89 

61.9369 

62.0944 

62,2521 

2.80535 

2.80713 

2.80891 

8.87130 

8,87694 

8.88257 

7.90 

62.4100 

2.81069 

8.88819 

7.91 

7.92 

7.93 

62.5681 

62.7264 

62.8849 

2.81247 

2.81425 

2.81603 

8.89382 

8.89944 

8.90S05 

7.94 

7.95 

7.96 

63.0436 

63.2025 

63.3616 

2.81780 

2.81957 

2.82135 

8.9IO67 

8.91628 

8.92188 

7.97 

7.98 

7.99 

63.5209 

63.6804 

63.8401 

2.82312 

2.82489 

2.82666 

8.92749 

8.93308 

8.93868 

8.00 

64.0000 

2.82843 

8.94427 


N* 

Vn 

ViON 


395 



Squares and Square Roots {Continued) 


N 

N * 

Vn 


8.00 

64.0000 

2.82843 

8.94427 

8.01 

8.02 

8.03 

64.1601 

64.3204 

64.4809 

2.83019 

2.83196 

2.83373 

8.94986 

8.95645 

8.96103 

8.04 

8.05 

8.06 

64.6416 

64.8025 

64.9636 

2.83549 

2.83725 

2.83901 

8.96660 

8.97218 

8.97775 

8.07 

8.08 
8.09 

65 1249 
65.2864 
65.4481 

2.84077 

2.84253 

2.84429 

8.98332 

8.98888 

8.99444 

8.10 

65.6100 

2.84605 

9.00000 

8.11 

8.12 

8.13 

65.7721 

65.9344 

66.0969 

2.84781 

2.84956 

2.85132 

9.00555 

9.01110 

9.01665 

8.14 

8.15 

8.16 

66.2596 

66,4225 

66.5856 

2.85307 

2.85482 

2.85657 

9.02219 

9.02774 

9.03327 

8.17 

8.18 
8.19 

66.7489 

66.9124 

67.0761 

2.85832 

2.86007 

2.86182 

9.03881 

9.04434 

9.04986 

8.20 

67.2400 

2.86356 

9.05539 

8.21 

8.22 

8.23 

67.4041 

67.6684 

67.7329 

2.86531 

2.86705 

2.86880 

9.06091 

9.06642 

9.07193 

8.24 

8.25 

8.26 

67.8976 

68.0625 

68,2276 

2.87054 

2.87228 

2.87402 

9.07744 

9.08295 

1 9.08845 

8.27 

8.28 
8.29 

68.3929 

68.5584 

68.7241 

2.87576 

2.87750 

2.87924 

9.09395 
9.09945 
; 9.10494 

8.30 

68.8900 

2.88097 

9.11043 

8.31 

8.32 

8.33 

69.0561 

69.2224 

69.3889 

2.88271 

2.88444 

2.88617 

9.11592 

9.12140 

9.12688 

8.34 

8.35 

8.36 

69.5556 

69.7225 

69.8896 

2.88791 

2.88964 

2.89137 

9.13236 

9.13783 

9.14330 

8.37 

8.38 

8.39 

70.0569 

70.2244 

70.3921 

2.89310 

2.89482 

2.89655 

9.14877 

9.15423 

9.15969 

8.40 

70.5600 

2.89828 

9.16515 

8.41 

8.42 

8.43 

70.7281 

70.8964 

71.0649 

2.90000 

2.90172 

2.90346 

9.17061 

9.17606 

9.18150 

8.44 

BAB 

8.46 

71.2336 

71.4025 

71.6716 

2.90517 

2.90689 

2.90861 

9.18695 

9.19239 

9.19783 

8.47 

8.48 

8.49 

71.7409 

71.9104 

72.0801 

2.91033 

2.91204 

2.91376 

9.20326 

9.20869 

9.21412 

8.50 

72:2500 

2.91548 

9.21954 

N 

N * 

Vn 

VIoN 


N 

N2 

\/n 


8.50 

72.2500 

2.91548 

9.21954 

8.61 

8.52 

8.53 

72.4201 

72.5904 

72.7609 

2.91719 

2.91890 

2.92062 

9,22497 

9.23038 

9.23580 

8.54 

8.55 

8.56 

72.9316 

73.1025 

73.2736 

2.92233 

2.92404 

2.92575 

9.24121 

, 9.24662 

9.25203 

8.57 

8.58 

8.59 

73.4449 

73.6164 

73.7881 

2.92746 

2.92916 

2.93087 

9,25743 

9,26283 

9.26825 

8.60 

73.9600 

2.93258 

9.27362 

8.61 

8.62 

8.63 

74.1321 

74.3044 

74.4769 

2.93428 

2.93598 

2.93769 

9.27901 

9.28440 

9.28978 

8.64 

8.65 

8.66 

74.6496 

74.8225 

74.9956 

2.93939 

2.94109 

2.94279 

9.29516 

9.30054 

9.30591 

8.67 

8.68 
8.69 

75.1689 

75,3424 

75.5161 

2.94449 

2.94618 

2.94788 

9.31128 

9.31665 

9.32202 

8.70 

75.6900 

2.94958 

9.32738 

8.71 

8.72 
8.73 

75.8641 

76.0384 

76.2129 

2.95127 

2.95296 

2.95466 

9.33274 

9.33809 

9.34345 

8.74 

8.75 

8.76 

76.3876 

76.5625 

76.7376 

2,95635 

2.95804 

2,95975 

9.34880 

9.35414 

9.35949 

8.77 

8.78 

8.79 

76.9129 
77.0884 1 
77.2641 

2.96142 

2.96311 

2,96479 

9.36483 

9.37017 

9.37550 

8.80 

77.4400 

2,96648 

9.38083 

8.81 

8.82 

8.83 

77.6161 

77.7924 

77,9689 

2.96816 

2.96985 

2.97163 

9.38616 

9.39149 

9.39681 

8.84 

8.85 

8.86 

78.1456 

78.3225 

78.4996 

2.97321 

2.97489 

2.97668 

9.40213 

9.40744 

9.41276 

8.87 

8.88 
8.89 

78.6769 

78.8544 

79.0321 

2.97825 

2.97993 

2.98161 

9.41807 

9.42338 

9.42868 


79.2100 

2.98329 

9.43398 

8.91 

8.92 

8.93 

79.3881 

79.5664 

79.7449 

2.98496 

2.98664 

2.98831 

9.43928 

9.44458 

9.44987 

8.94 

8.95 

8.96 

79.9236 

80.1025 

80.2816 

2.98998 

2.99166 

2.99333 

9.45516 

9.46044 

9.46573 

8.97 

8.98 

8.99 

80.4609 

80.6404 

80.8201 

2.99500 

2.99666 

2.99833 

9.47101 

9.47629 

9.48156 

9.00 

81.0000 

3.00000 

9.48683 

N 

N ® 

Vn 

VION 


396 



Squares and Square Roots {Continued) 


N 

N2 

n/n 

vTon 

9.00 

81.0000 

3.00000 

9.48683 

9.01 

9.02 

9.03 

81.1801 

81.3604 

81.5409 

3,00167 

3.00333 

3.005dd 

9.49210 

9.49737 

9.50263 

9.04 

9.05 

9.06 

81.7216 

81.9025 

82.0836 

3.00666 

3.00832 

3.00998 

9.50789 

9.51315 

9.51840 

9.07 

9.08 

9.09 

82.2649 

82.4464 

82.6281 

3.01164 

3.01330 

3.01496 

9.52365 

9.52890 

9.53415 

9.10 

82.8100 

3.01662 

9.53939 

9.11 

9.12 

9.13 

82.9921 

83.1744 

83.3569 

3.01828 

3.01993 

3.02159 

9.54463 

9.54987. 

9.55510 

9.14 

9.15 

9.16 

83.5396 

83,7225 

83.9056 

3.02324 

3.02490 

3.02655 

9.56033 

9.56556 

9.57079 

9.17 

9.18 

9.19 

84.0889 

84.2724 

84.4561 

3.02820 

3.02985 

3.03150 

9.57601 

9.58123 

9.58645 

9.20 

84.6400 

3.03315 

9.59166 

9.21 

9.22 

9.23 

84.8241 

85.0084 

85.1929 

3.03480 

3.03645 

3.03809 

9,59687 

9.60208 

9.60729 

9.24 

9.25 

9.26 

85.3776 

85.5625 

85.7476 

3.03974 

3.04138 

3.04302 

9.61249 

9.61769 

9.62289 

9.27 

9.28 

9.29 

85.9329 

86.1184 

86.3041 

3.04467 

3.04631 

3.04795 

9.62808 

9.63328 

9.63846 

9.30 

86.4900 

3.04959 

9.64365 

9.31 

9.32 

9.33 

86.6761 

86.8624 

87.0489 

3.05123 

3.05287 

3.05450 

9.64883 

9.65401 

9.65919 

9.34 

9.35 

9.36 

87.2356 

87.4225 

87.6096 

3.05614 

3.05778 

3.05941 

9.66437 

9.66954 

9.67471 

9.37 

9.38 

9.39 

87.7969 

87.9844 

88.1721 

3.06105 

3.06268 

3.06431 

9.67988 

9.68504 

9.69020 

9.40 

88.3600 

3,06594 

9.69536 

9.41 

9.42 

9.43 

88.5481 

88.7364 

88.9249 

3.06757 

3.06920 

3.07083 

9.70052 

9.70567 

9.71082 

9.44 

9.45 

9.46 

89.1136 

89.3025 

89.4916 

3.07246 

3.07409 

3.07571 

9.71597 

9.72111 

9,72625 

9.47 

9.48 

9.49 

89.6809 

89.8704 

90.0601 

3.07734 

3.07896 

3.08058 

9.73139 

9.73653 

9.74166 

9.50 

90.2500 

3.08221 

9.74679 

N 

N2 

•n/n 

VlON 


N 

N2 

\/n 


9.50 

90.2500 

3.08221 

9.74679 

9.51 

9.52 

9.53 

90.4401 

90.6304 

90.8209 

3.08383 

3.08545 

3.08707 

9.75192 

9.75705 

9,76217 

9.54 

9.55 

9.56 

91.0116 

91.2025 

91.3936 

3.08869 

3.09031 

3.09192 

9.76729 

9,77241 

9,77753 

9.57 

9.58 

9.59 

91.5849 

91.7764 

91.9681 

3.09354 

3.09516 

3.09677 

9.78264 

9.78775 

9,79285 

9.60 

92.1600 

3.09839 

9.79796 

9.61 

9.62 

9.63 

92.3521 

92.5444 

92.7369 

3.10000 

3.10161 

3.10322 

9.80306 

9.80816 

9.81326 

9.64 

9.65 

9.66 

92.9296 

93.1225 

93.3156 

3,10483 

3.10644 

3.10805 

9.81835 

9.82344 

9.82853 

9.67 

9.68 

9.69 

93.5089 

93.7024 

93.8961 

3.10966 

3.11127 

3.11288 

9.85362 

9.85870 

9.84378 

9.70 

94.0900 

3.11448 

9.|4886 

9.71 

9.72 

9.73 

94.2841 

94.4784 

94.6729 

3.11609 

3.11769 

3.11929 

9.85393 

9.85901 

9.86408 

9.74 

9.75 

9.76 

94.8676 

95.0625 

95.2576 

3.12090 

3.12250 

3.12410 

9.86914 

9.87421 

9.87927 

9.77 

9.78 

9.79 

95.4529 

95.6484 

95.8441 

3.12570 

3.12730 

3.12890 

9.88433 

9.88939 

9.89444 

9.80 

96.0400 

3.13050 

9i9949 

9.81 

9.82 

9.83 

96.2361 

96.4324 

96.6289 

3.13209 

3.13369 

3.13528 

9.90454 

9.90959 

9.^1464 

9.84 

9.85 

9.86 

96.8256 

97.0225 

97.2196 

3.13688 

3.13847 

3.14006 

9.^1968 

9.92472 

9.92975 

9.87 

9.88 

9.89 

97.4169 

97.6144 

97.8121 

3.14166 

3.14325 

3.14484 

9.93479 

9.93982 

9.94485 

9.90 

98.0100 

3.14643 

9.94987 

9.91 

9.92 

9.93 

98.2081 

98,4064 

98.6049 

3.14802 

3.14960 

3.15119 

9.95490 

9.95992 

9.96494 

9.94 

9.95 

9.96 

98.8036 

99.0025 

99.2016 

3.15278 

3.15436 

3.15595 

9.96995 

9.97497 

9.97998 

9.97 

9.98 

9.99 

99.4009 

99.6004 

99.8001 

3.15753 

3.15911 

3.16070 

9.98499 

9.98999 

9.99500 

10.00 

100.000 

3.16228 

lo.pooo 

N 

N* 

Vn 

\/l0N 


197 



398 


INTRODUCTION TO MATHEMATICAL STATISTICS 
Table IL iSormal Areas and Ordinates* 


t 



TV 

0(0 


t 

</.(«) 


.00 

.39894 

.00000 

.45 

.36053 

. 17364 


.26609 

.31594 

.01 

.39892 

.00399 

.46 

.35889 

. 17724 

.91 

.26369 

.31859 

.02 

,39886 

JQQ 798 

\ -47 

.35723 

. 18082 

,92 

.26129 

.32121 

.03 

.39876 

.01197 

,48 

.35553 

. 18439 

,93 

.25888 

.32381 

.04 

.39862 

.01595 

.49 

.35381 

. 18793 

.94 

.25647 

.32639 

.05 

.39844 

.01994 

.50 

.35207 

. 19146 

.95 


.32894 

.06 

.39822 

.02392 

.51 

.35029 

. 19497 

.96 


.33147 

.07 

.39797 

.02790 

.52 

,34849 

. 19847 

.97 

.24923 

.33398 

.08 

.39767 

.03188 

.53 

.34667 

.20194 

.98 

.24681 

.33646 

.09 

.39733 

.03586 

.54 

.34482 

.20540 

.99 

.24439 

.33891 

.10 

.39695 

.03983 

.55 

.34294 

.20884 


.24197 

.34134 

.11 

.39654 

.04380 

.56 

.34105 

.21226 

1.01 

.23955 

.34375 

.12 

.39608 

.04776 

.57 

.33912 

.21566 

1.02 

.23713 

.34614 

.13 

.39559 

.05172 

.58 

.33718 

.21904 

1.03 

.23471 

■KtKMil 

"nr 

. 39^5 

.05567 

.59 

.33521 

.22240 

1.04 

.23230 

.35083 

.15 

.39448 

.05962 

.60 

.33322 

.22575 

1.05 

.22988 

.35314 

.16 

.39387 

.06356 

.61 

.33121 

.22907 

1.06 

.22747 

.35543 

.17 

.39322 

.06749 

.62 

.32918 

.23237 

1.07 


.35769 

.18 

.39253 

.07142 

.63 

,32713 

.23565 

1.08 

.22265 

.35993 

.19 

.39181 

.07535 

.64 

.32506 

.23891 

1.09 

.22025 


.20 

.39104 

.07926 

.65 

.32297 

.24215 


.21785 

.36433 

.21 

.39024 

.08317 

.66 

.32086 

.24537 

1,11 

.21546 


.22 

.38940 

' 7^706 

.67 

.31874 

.24857 

1,12 


.36864 

.23 

.38853 

.09095 

.68 

.31659 

.25175 

1.13 


mimm 

.24 

.38762 

.09483 

.69 

.31443 

.25490 

1.14 

.20831 

.37286 

.25 

.38667 

.09871 

.70 

.31225 

,25804 

1.15 

.20594 

.37493 

.26 

.38568 

. 10257 

.71 

.31006 

.26115 

1.16 

KiSSl 

.37698 

.27 

.38466 

.10642 

.72 

.30785 

; .26424 

1.17 



.28 

.38361 

7 n 026 ' 

.73 

.30563 

.26730 

1.18 

. 19886 

liiSfiol 

.29 

.38251 

.11409 

•74 

.30339 

.27035 

1.19 

. 19652 

.38298 

.30 

.38139 

. 11791 

.75 

.30114 

.27337 

1.20 

. 19419 

.38493 

.31 

.38023 

.12172 

.76 

.29887 

.27637 

1.21 

. 19186 

.38686 

.32 

.37903 

.12552 

.77 

.29659 

.27935 

1.22 

. 18954 

.38877 

.33 

.37780 

.12930 1 

.78 

.29431 

.28230 

1.23 

. 18724 

■ttitniiiia 

.34 

.37654 

.13307 : 

.79 

.29200 

.28524 

1.24 

. 18494 

.39251 

.35 

..37524 

. 13683 

.80 

.28969 

.28814 

1.25 

. 18265 

.39435 

.36 

.37391 

. 14058 

.81 

.28737 

.29103 

1.26 

. 18037 

.39617 

.37 

.37255 

.14431 

.82 

,28504 

.29389 

1.27 

. 17810 

.39796 

.38 

.37115 

. 14803 

.83 

.28269 

.29673 

1.28 

. 17585 

.39973 

,39 

.36973 

. 15173 

.84 

.28034 

.29955 

1.29 



.40 

.36827 

. 15542 

.85 

.27798 

.30234 

1.30 

. 17137 


.41 

.36678 

.15910 

.86 

.27562 

.30511 

1.31 

. 16915 

.40490 

.42 

.36526 

.16276 

.87 

.27324 

.30785 

1.32 

. 16694 


.43 

.36371 

. 16640 

.88 

.27086 

.31057 

1.33 

. 16474 


.44 

.36213 

.17003 

.89 

.26848 

.31327 

1.34 

. 16256 

.40988 


* Reprinted, by permission, from Kenney, Mathematics of SiaiisticSf Part One, pp. 225-227, D. 
Van Nostrand, New York, 









APPENDIX 2 399 

Normal Areas and Ordinates (Continued) 


t 

*«) 


n 



n 


fo * 4 >( i)di 

1.35 

.16038 

.41149 

1,80 

.07895 

.46407 

2.25 

,03174 

,48778 

1.36 

. 15822 

.41309 

1,81 

.07754 

.46485 

2.26 

.03103 

,48809 

1.37 

. 15608 

.41466 

1.82 

.07614 

.46562 

2.27 

.03034 

.48840 

1.38 

. 15395 

.41621 

1.83 

.07477 

.46638 

2,28 

.02965 

.48870 

1,39 

. 15183 

.41774 

1.84 

.07341 

.46712 

2.29 

,02898 

.48899 

1.40 

. 14973 

.41924 

1.85 

.07206 

.46784 

2.30 

.02833 

.48928 

1,41 

. 14764 

.42073 

1.86 

.07074 

.46856 

2.31 

.02768 

.48956 

1.42 

. 14556 

.42220 

1.87 

.06943 

.46926 

2.32 

.02705 

.48983 

1.43 

. 14350 

.42364 

1,88 

.06814 

.46995 

2.33 

.02643 

.49010 

1.44 

.14146 

.42507 

1.89 

.06687 

.47062 

2.34 

.02582 

.49036 

1.45 

.13943 

.42647 

1.90 

. 0656 ^ 

.47128 

2.35 

.02522 

.49061 

1.46 

.13742 

.42786 

1,91 

.06439 

.47193 

2.36 

.02463 

.49086 

1.47 

.13542 

.42922 

1.92 

.06316 

.47257 

2.37 

.02406 

.49111 

1.48 

. 13344 

.43056 

1.93 

.06195 

.47320 

2.38 

.02349 

.49134 

1.49 

. 13147 

.43189 

1.94 

.06077 

.47381 

2.39 

.02294 

.49158 

1.50 

,12952 

.43319 

1.95 

.05959 

. 47441 

2.40 

.02239 

.49180 

1.61 

.12758 

.43448 


-^ 58 iT ~ 

47500 

2.41 

.02186 

.49202 

1.52 

.12566 

.43574 

1.97 

.05730 


2.42 

.02134 

.49224 

1.53 

.12376 

.43699 

1.98 

.05618 

.47615 

2.43 

.02083 

.49245 

1.64 

.12188 

.43822 

1.99 

.05508 

.47670 

2.44 

.02033 

.49266 

1.55 

.12001 

.43943 

2,00 

.05399 

.47725 

2.45 

.01984 

.49286 

1.56 

.11816 

.44062 

2.01 

.05292 

.47778 

2.46 

.01936 

.49305 

1.57 

.11632 

.44179 

2.02 

.05186 

.47831 

2.47 

.01889 

.49324 

1.58 

.11450 

.44295 

2.03 

.05082 

.47882 

2.48 

.01842 

.49343 

1.59 

,11270 

.44408 

2.04 

.04980 

.47932 

2.49 

.01797 

.49361 

1.60 

.11092 

.44520 

2.05 

.04879 

47982 

2.50 

.01753 

,49379 

1.61 

. 10915 

.44630 

2.06 

.04780 

.48030 

2.51 

.01709 

.49396 

1.62 

.10741 

.44738 

2.07 

.04682 

.48077 

2.52 

.01667 ; 

.49413 

1.63 

. 10567 

.44845 

2.08 

.04586 

.48124 

2.53 

.01625 

.49430 

1.64 

.10396 

.44950 

2.09 

.04491 

,48169 

2.54 

.01585 

.49446 

1.65 

.10226 

.45053 

2.10 

.04398 

.48214 

2.55 

.01545 

.49461 

1,66 

.10059 

.45154 

2.11 

.04307 

.48257 

2.56 

.01506 

,49477 

1.67 

.09893 

.45254 

2.12 

.04217 

.48300 

2.57 

,01468 

.49492 

1.68 

.09728 

.45352 

2.13 

.04128 

.48341 

2.58 

.01431 

.49506 


.09566 

.45449 

2.14 

,04041 

.48382 

2.59 

,01394 

,49520 

1.70 

.09405 

.45543 

2.15 

.03955 

.48422 

2.60 

.01358 

,49534 

1,71 

.09246 

.45637 

2.16 

.03871 

.48461 

2.61 

.01323 

,49547 

1.72 

.09089 

.45728 

2.17 

.03788 

.48500 

2.62 

.01289 

49560 

1.73 

.08933 

.45818 

2.18 

.03706 

.48537 

2.63 

.01256 

.49573 

1.74 

.08780 

.45907 

2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.76 

.08628 

.45994 

2.20 

.03547 

.48610 

2.65 

.01191 

.49598 

1.76 

.08478 

1 .46080 

2.21 

.03470 

.48645 

2.66 

.01160 

.49609 

1.77 

.08329 

i .46164 

2.22 

.03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

^ .46246 

2.23 

.03319 

.48713 

2.68 

.01100 

.49632 

1.79 

.08038 

.46327 

2.24 

.03246 

.48745 

2.69 

.01071 

,49643 





400 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Normal Areas and Ordinates {Continued) 


i 

<^(0 


t 

*(0 


t 

4>(0 


2.70 

.01042 

.49653 

3.15 

.00279 

.49918 

3.60 

.00061 

.49984 

2.71 

.01014 

.49664 

3.16 

.00271 

.49921 

3.61 

.00059 

.49985 

2.72 

.00987 

.49674 

3.17 

.00262 

.49924 

3.62 

.00057 

.49985 

2.73 

.00961 

.49683 

3.18 

.00254 

.49926 

3.63 

.00055 

.49986 

2.74 

.00935 

.49693 

3.19 

.00246 

.49929 

3.64 

.00053 

.49986 

2.75 



3.20 

,00238 

.49931 

3.65 

.00051 

.49987 

2.76 


.49711 

3.21 

.00231 

.49934 

3.66 

.00049 

.49987 

2.77 



3.22 

.00224 

.49936 

3.67 

.00047 

.49988 

2.78 


.49728 

3.23 

.00216 

.49938 

3.68 

.00046 

.49988 

2.79 


.49736 

3.24 

.00210 

.49940 

3.69 

.00044 

.49989 

2.80 

.00792 

.49744 

3.25 

.00203 

.49942 

3.70 

.00042 

.49989 

2.81 


.49752 

3.26 

.00196 

.49944 

3.71 

,00041 

.49990 

2.82 

.00748 


3.27 

.00190 

.49946 

3.72 

.00039 

.49990 

2.83 


.49767 

3.28 

.00184 

.49948 

3.73 

.00038 

.49990 

2.84 

.00707 

.49774 

3.29 


.49950 

3.74 

.00037 

.49991 

2.85 


.49781 

3.30 


.49952 

3.75 

.00035 

.49991 

2.86 


.49788 

3.31 

.00167 

.49953 

3.76 

.00034 

.49992 

2.87 


.49795 

3.32 


.49955 

3.77 

.00033 

.49992 

2.88 

.00631 

.49801 

3.33 

■ 

.49957 

3.78 

.00031 

.49992 

2.89 

.00613 


3.34 

1 SsSI 

.49958 

3.79 

.00030 

.49992 

2.90 

.00595 

.49813 

3.35 

.00146 

.49960 

3.80 

.00029 

.49993 

2.91 

.00578 

.49819 

3.36 

.00141 

.49961 

3.81 

.00028 

.49993 

2.92 


.49825 

3.37 

Bli ijR si 

.49962 

3.82 

.00027 

.49993 

2.93 

MiiiSfSI 

.49831 

3.38 

Hi' 'ill! 

.49964 

3.83 

.00026 

.49994 

2.94 

.00530 

.49836 

3.39 

.00127 

.49965 

3.84 

.00025 

.49994 

2.95 


.49841 

3.40 

BRI 

.49966 

3.85 

.00024 

.49994 

2.96 


.49846 

3.41 

.00119 

.49968 

3.86 

.00023 

.49994 

2.97 

.00485 

.49851 

3.42 

.00115 

.49969 

3.87 

.00022 

.49995 

2.98 

.00471 

.49856 

3.43 

.00111 

.49970 

3.88 

.00021 

.49995 

2.99 

.00457 

.49861 

3.44 

.00107 

.49971 

3.89 

.00021 

.49995 

3.00 

.00443 

.49865 

3.45 

BSTEll 

.49972 

3.90 

.00020 

.49995 

3.01 

.00430 

.49869 

3.46 

.00100 

.49973 

3.91 

.00019 

.49995 

3.02 


.49874 

3.47 


.49974 

3,92 

.00018 

.49996 

3.03 

HijfiiSI 

.49878 

3.48 

Kvrr» 

.49975 

3.93 

.00018 

.49996 

3.04 

.00393 

.49882 

3.49 


.49976 

3.94 

.00017 

.49996 

8.05 

.00381 

.49886 

3.50 


.49977 

3.95 

.00016 

.49996 

3.06 

.00370 

.49889 

3.51 

HiiiMiil 

.49978 

3.96 

.00016 

.49996 

3.07 

.00358 

,49893 

3.52 

.00081 

.49978 

3.97 

.00015 

.49996 

3.08 

.00348 

.49897 

3.53 

,00079 

.49979 

3.98 

.00014 

.49997 

3.09 



3.54 

,00076 

.49980 

3.99 

.00014 

.49997 

3.10 

.00327 

.49903 

3.55 

.00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3.12 


KMini 

3.57 

.00068 

.49982 




3.13 


.49913 

3.58 

.00066 

.49983 




3.14 

imy 

.49916 

3,59 

,00063 

.49983 














APPENDIX 2 

























402 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Table IV. Student’s t Distribution* 


Degrees 


Probability of a deviation greater than i 


freedom 71 

.005 

.01 

.025 

.05 

.1 

.15 

1 

63.657 

31.821 

12.706 

6.314 

3.078 

1.963 

2 

9.925 

6.965 

4.303 

2.920 

1.886 

1.386 

3 

5.841 

4.541 

3.182 

2.353 

1.638 

1.250 

4 

4.604 

3.747 

2.776 

2.132 

1.533 

1.190 

6 

4.032 

3.365 

2.571 

2.015 

1.476 

1.156 

6 

3.707 

3.143 

2.447 

1.943 

1.440 

1.134 

7 

3.499 

2.998 

2.365 

1.895 

1.415 

1.119 

8 

3.355 

2.896 

2.306 

1.860 

1.397 

1.108 

9 

3.250 

2.821 

2.262 

1.833 

1.383 

1.100 

10 

3.169 

2.764 

2.228 

1.812 

1.372 

1.093 

11 

3.106 

2.718 

2.201 

1.796 

1.363 

1.088 

12 

3.055 

2.681 

2.179 

1.782 

1.356 

1.083 

13 

3.012 

2.650 

2.160 

1.771 

1.350 

1.079 

14 

2.977 

2.624 

2.145 

1.761 

1.345 

1.076 

15 

2.947 

2.602 

2.131 

1.753 

1.341 

1.074 

16 

2.921 

2.583 

2.120 

1.746 

1.337 

1.071 

17 

2.898 

2.567 

2.110 

1 1.740 

1.333 

1.069 

18 

2.878 

2.552 

2.101 

; 1.734 

1.330 

1.067 

19 

2.861 

2.539 

2.093 

1.729 

1.328 

1.066 

20 

2.845 

2.528 

2.086 

1.725 

1.325 

1.064 

21 

'‘''""Tlsi 

2.518 

2.080 

1.721 

1.323 

1.063 

22 

2.819 

2.508 

2.074 

1.717 

1.321 

1.061 

23 

2.807 

2.500 

2.069 

1.714 

1.319 

1.060 

24 

2.797 

2.492 

2.064 

1.711 

1.318 

1.059 

25 

2.787 

2.485 

2.060 

1.708 

1.316 

1.058 

26 

2.779 

2.479 

2.056 

1.706 

1.315 

1.058 

27 

2.771 

2.473 

2.052 

1.703 

1.314 

1.057 

28 

2.763 

2.467 

2.048 

1.701 

1.313 

1.056 

29 

2.756 

2.462 

2.045 

1.699 

1.311 

1.055 

30 

2.750 

2.457 

2.042 

1.697 

1.310 

1.055 

00 

2.576 

2.326 

1.960 

1.645 

1.282 

1.036 


The probability of a deviation numerically greater than i is twice the 
probability given at the head of the table. 

* This table is reproduced from Statistical Methods for Research Workers, with the generous 
permission of the author, Professor R. A. Fisher, and the publishers, Messrs. Oliver and Boyd. 



APPENDIX 2 


403 


Student’s ? Distribution (Continued) 


Degrees 

of 

Probability of a deviation greater than t 

freedom n 

.2 

.25 

.3 

.35 

.4 

U5 

1 

1.376 

1.000 

.727 

.510 

.325 

>158 

2 

1,061 

.816 

.617 

.445 

.289 

a42 

3 

.978 

.765 

.584 

.424 

.277 

U37 

4 

.941 

.741 

.569 

.414 

.271 

^34 

6 

.920 

.727 

.559 

.408 

.267 

a32 

6 

.906 

.718 

.553 

.404 

.265 

;131 

7 

.896 

.711 

.549 

.402 

.263 

1130 

8 

,889 

.706 

.546 

.399 

.262 

.130 

9 

.883 

.703 

.543 

.398 

.261 

1129 

10 

.879 

.700 

.542 

.397 

.260 

^129 

11 

.876 

.697 

.540 

.396 

.260 

i;i29 

12 

.873 

.695 

.539 

.395 

.259 

1128 

13 

.870 

.694 

.538 

.394 

.259 

1128 

14 

.868 

,692 

.537 

.393 

.258 

1128 

15 

.866 

.691 

.536 

.393 

.258 

.128 

16 

.865 

.690 

.535 

.392 

.258 

.128 

17 

.863 

.689 

.534 

.392 

.257 

;i28 

18 

.862 

.688 

.534 

.392 

.257 

>127 

19 

.861 

.688 

.533 

.391 

.257 

U27 

20 

,860 

.687 

.533 

.391 

.257 

>127 

21 

.859 

.686 

.532 

.391 

.257 

>127 

22 

.858 

.686 

.532 

.390 

.256 

U27 

23 

-,858 

.685 

.532 

.390 

.256 

;i27 

24 

.857 

.685::I.:: 

.531 

.390 

.256 

;127 

25 

.856 

.684 : 

.531 

.390 

.256 

>127 

26 

.856 

.684 

.531 

.390 

.256 

;127 

27 

.855 

.684 

.531 

.389 

.256 

1,127 

28 

.855 

.683 

.530 

.389 

.256 

>127 

29 

.854 

.683 

.530 

.389 

.256 

.127 

30 

.854 

.683 

.530 

.389 

.256 

>127 

00 

.842 

.674 

.524 

.385 

.253 

:,126 


The probability of a deviation greater than t is twij^ the 

probability given at the head of the table. 



TO MATHEMATICAL STATISTICS 



giate Press, Iowa State College 



3: All 3.38 3.36 


APPENDIX 2 


405 


1-itf coo oooi cO«4 

Oi^ .-hO oeo ON cr.to out op^ oom 

<N0 ci« ciei c^e« ^ej .i4e«i ^eij Nti 

' ' ■ O0<n ■'■ '■'■NN "■ rtdfii c’^hI 
<N«H 1-^0 OOp ON <?»,«» OiO C5IO < 

m‘c«» c-iw • 

Jc'iji b e4l 2*0 ob •«"« i 

0404 >-hO '-'OTk oeo ON O lO O) tO < 

e4« woi <NM 1-^04 » 


(<9*4 “’t-IN .€»«• 
NM N*4 <0*4 






Table V. F Distribution {Continued) 


406 


INTRODUCTION TO MATHEMATICAL STATISTICS 





APPENDIX 2 



^3 


fes 


85 

s;3 

3a 

38 

2!! 

22 

sa 

83 


*H*4 

*4 ^ 

thH 

* 4*4 

*4H 

*4H 

H*4 

HH 

hH 

HH. 

*4H 

*4*H 



^8 

^3 

!;3 

^8 

35 


0«9 

MW 

MW 

23 


•-4 10 
*HH 


*4 Hi 

*-| *4 ■ 


HH 

hH 

HH 

H*4 

H'*) ■ 

*4iH 

H H,' 

hH 

HH 

OQO 

<0*4 



3S 

oof 
CO 10 

m3 

m8 

S8 

(OCt 

MW 

S8 

23 

f ia 

*40 

n*i4 

*4*4 

*4 *4 

*4H 

■*4H 

*4H 

*4H 

H*4 

hH 

hH 

HH 

hH' 

HH 


SS 

004* 

4<t* 

(0*4 

4<0» 

:SS 

^3 

3S 

mS 

MO 

^5 

89 

S3 

41 «b 
MW 


r4*4 

*4*4 

*4*if 

*4H 

*4H 

HH 

hH 

h*4 

hH 

'HH ■ 

hH 

HH 

iSS 


ggi 

o«o 

4'fc* 

f 

4*0 

mo 

40f 

^3 

0 05 
MtO 

£8 

mS 

Si; 

R3 

QOh 

Mj; 

*H W 

rH*4 

*4*4 

*4H 

*h*4 

*4H 

HH 

hH 

HH 

HH 

HH 

HH 

* 4*4 

ss 

ooo 

looj 





cow 

4*f 


:$8 

MO 

4)MO 

OOf 

MI0 

83 

‘OO 

MIO 


*4*4 

*4*4 

*4H 

*4*4 

*4H 

HH 

HH 

hH 

H H 

HH 

hH 

*4H 

coo 

<o8 

?8 

UtS 

S8 

S3 

Sf 

0510 

Hf 

f 0 
Hf 

10 05 
HO 

$3 


005 

^t0 

•-iCf 

»H*4 

*4 HI 

*4*4 

*hH 

"*hH ' 

H H 

H'*4' 

''HH' 

dH 

H'H 

HH " 

HH 

oo 

<0*4 


i28 

88 

C'lO 

<oo» 

83 

f 0 

mce 

Utl0 

moQ 

38 

MO 

lOf 

05 d 
rrf 

t-H 

4'f 

005 

40(0 


f-iO 

*40 

*40 

*4H 

*4H 

H*4' 

HH 

H'*4" 

HH " 

H H '■ 

'"H'H ' 

'dd'id4 

ss 


f2S 

00 o» 

(SO 

f f 

00 

88 

MM 

<00 

83 

05H 

>001 

h. W 
•COO 

33 

83 

MO 

Mf 



*40 

^O' 

*40' 

HO 

h'H '. 

*4.H ' 

'HH'' 

hH 


HH' 

■ HH 


isa 

‘.‘to 

t'O 

QO 

CNIO 

fH 

OH 
f H 

Xto 

00 

iftM 

(CO 

38 

Mf 

<COI 

88 

0005 

•ow 

£8 

IHCf 

*40 

*40 

*40 

HO 

*4 0 

HO 

HO 

HO" 

■ ■ 

' HH 

' 'H'H ' 

H H 

OOM 

S§8 

r:o 

ooo 

ss 

^8 

23 

tCOI 
f H 

MMO 

t:a 

S8 

£3 

lO H 

too 

SS 


*40 

*40 

HO 

*hO 

HO 

HO 

HO 

HO 

HO 

HO ' 

HO'. 

H H 

S!S 

gg8 

<co 

00 4« 

mS; 

S3 

Si8 

28 

28 

^8 

40** 

f *4 


28 

bsf 

00 

rHN 

*HO 

*<o 

■ HO 

HO' 

HO 

■ HO"' 

HO" 

H'O' 

-'HH' 

Hd'“' 

■'Hd' 

"H'd 

^<0 

o»u> 

co« 

OlO 

SIS 



S83 

•0(0 

OOO 


0)0 

00 m 

oeo 

coo 

oew 

f 0 

(CO 

f 0 

loee 
f H 

»-He4 

*4 0 " 

*4 0' 

'.HO 

*40 

HO 

.'HO' 

HO 

HO 

' HM' 

HO 

H'O^' 

HO 


OlO 

M<0 

oie 

SS 

MH . 

01 M3 

*4 00 
0540 

88 


lOf 

00 m 

83 

OOO 

bto 

ooo 

05 -Oi 

f 0 

■ 

-hIo 

■ ,-5o 

HO 

■'HO' 

;Hd 

■ HO ' 

H'd' 

Hd 

'-H'd' 

nd' 

'"nd" 

"H'd 

CNO 

C^ 

~ST 

gjS 

^3 

f 0 

c-m 

oS 

MH 

C5I0 

sa 

Ti" 

r-H 

QC^i 

8w 

33 

£8 

C4M 

0)0 

*-4*0 

-*4 0 . 

*4d; 

*4*0 

'dd ' 

rJieJ' 

-H'd" 

r-'HO"" 

'•H'd'"' 

"Hd" 

*4 M 

o»* 

LtOO 

or* 


M O 
Of 

*-f 

o«> 

S3 

1 2 

88 

ss 

M d 
0510 

S8 

88 

§88 


0)0 

do 

dd 

dd 

■ HO 

HO 

*40 


■ H d" 

'-*4d'' 

"H'd 

'Hd 

28 

r-iua 

f-^oe 


QOd 

Of 

£2 

Of 

Moi 

00 

08 

S8 

33 

88 

8S 


C^M 

0)0 

do 

dd 

dd 

dd 

dd 

. d'd 

dd 


'H'd'-' 

'■'H'd: 

"H"d 


28 

28 

!33 

2S 

2S 

28 

ooo 

Of 


Of 

MO 

0(0 

88 

o3 

o»w 

0)0 

d o 


"dd" 

d d 

dd 

■ o'd ■ 

d'd"’ 

"d'd'" 


'"■dd“ 

■ dd 

0>« 

C4iH 

0>IO 

0)*4 

•fio 

OlH 

^3 

(Nf 

MO 

^3 

28 

28 

28 

^8 

*-* w 

28 

050 

08 

e<io| 

0)0 

dd 

dd 

dd 

"dd" 

dd 

od 

"‘dd'" 

eid 

dd 

dd 

dd 

0*4 

«o 

eceS 

<C H 

rceo 

*r, e> 

MO 

M 10 
MCJ 

M N 


f m 
N*4 

OH 

MH 

MQ 
M 0 

MH 

MO 

m8 

c^’o 

0)0 

do* 

dd 

dd 

dd 

dd 

dd 

dd 

d d 

dd 

dd 

dd 

oe4 

‘Ol* 


^8 

<08 

8S 


(C H 

Tft 10 

Hf 

H4I 


H H 
4*40 

S8 


£8 

0)0 

0)0 

do 

dd 

dd 

dd 

dd 

dd 

' d'd " 

■■■''dd' 


'"'dW:' 

.■rdd 


QC(D 

<cc>s 

t'-*4 

•■‘to 

f H 

28 

“cT-mT 
f 0 

Tt 


“f h' 
(CO 

i.t! W 

000 

S!3 

oS 

88 

0)4^ 

cj^ 

dd 

d-d 

dd 

dd 

dd 

dd 

dd 

dd 

dd 

dd 

dd 

28 

*4 

-;0 


28 

28 

28 

OM 

OOO 

Of 

(0)0 

Of 


MtO 

P*0 

is 

3 ^ 

COlO 

MO 

054* 

dd 

dd 

dd 

dd 

, .^.^... 

dd 

, dd 

■tfo'd' 



co*« 

0*4 


88 

83 

GOH 

00 

13 

HQ 

050 

S3 

•-iH 

oo 

05 to 

COf 

OOf 

^8 

33 


4*1*4 

do* 

d'd: 

df 

«<» 

dd 

dd 


dd'" 

■d'd ' 

•-■d'd'' 

' wd 














s 


60 

65 

70 

8 

oot 

125 

i 

1 

1 

1 

*4' 




408 


INTRODUCTION TO MATHEMATICAL STATISTICS 


03991 

10461 

93716 

Table VI. Random Digits 

16894 98953 73231 39528 

72484 

82474 

2559?: 

38555 

95554 

32886 

59780 

09958 

18065 

81616 

18711 

53342 

4427€ 

17546 

73704 

92052 

46215 

15917 

06253 

07586 

16120 

82641 

2282C 

32643 

52861 

95819 

06831 

19640 

99413 

90767 

04235 

13574 

1720C 

69572 

68777 

39510 

35905 

85244 

35159 

40188 

28193 

29593 

88627 

24122 

66591 

27699 

06494 

03152 

19121 

34414 

82157 

86887 

55087 

61196 

30231 

92962 

61773 

22109 

78508 

63439 

75363 

44989 

16822 

30532 

21704 

10274 

12202 

94205 

20380 

67049 

09070 

93399 

45547 

03788 

97599 

76867 

20717 

82037 

10268 

79495 

04146 

52162 

90286 

48228 

63379 

85783 

47619 

87481 

37220 

91704 

30552 

04737 

21031 

88618 

19161 

41290 

67312 

71857 

15957 

48545 

35247 

18619 

13674 

71299 

23853 

05870 

01119 

92784 

26340 

75122 

11724 

74627 

73707 

27954 

58909 

82444 

99006 

04921 

73701 

92904 

13141 

32392 

19763 

80863 

00514 

20247 

81759 

45197 

25332 

69902 

63742 

78464 

22501 

33564 

60780 

48460 

85558 

15191 

18782 

94972 

11598 

62095 

36787 

90899 

75754 

60833 

25983 

01291 

41349 

19152 

00023 

12302 

80783 

78038 

70267 

43529 

06318 

38384 

74761 

36024 

00867 

76378 

41605 

55986 

66485 

88722 

56736 

66164 

49431 

94458 

74284 

05041 

49807 

87539 

08823 

94813 

31900 

54155 

83436 

54158 

34243 

46978 

35482 

16818 

60311 

74457 

90561 

72848 

11834 

75051 

93029 

47665 

64382 

34677 

58300 

74910 

64345 

19325 

81549 

60365 

94653 

35075 

3394£ 

45305 

07521 

61318 

31855 

14413 

70951 

83799 

42402 

56623 

34442 

59747 

67277 

76503 

34513 

39663 

77544 

32960 

07405 

36409 

83232 

16520 

69676 

11654 

99893 

02181 

68161 

19322 

53845 

57620 

52606 

68652 

27376 

92852 

55866 

88448 

03584 

11220 

94747 

07399 

37408 

79375 

95220 

01159 

63267 

10622 

48391 

31751 

57260 

68980 

0533G 

33521 

26665 

55823 

47641 

86225 

31704 

88492 

99382 

14454 

04504 

59589 

49067 

66821 

41575 

49767 

04037 

30934 

47744 

07481 

83828 

20554 

91409 

96277 

48257 

50816 

97616 

22888 

48893 

27499 

98748 

59404 

72059 

43947 

51680 

43852 

59693 

78212 

16993 

35902 

91386 

42614 

29297 

01918 

28316 

25163 

01889 

70014 

15021 

68971 

11403 

34994 

41374 

70071 

14736 

65251 

07629 

37239 

33295 

18477 

65622 

99385 

41600 

11133 

07586 

36815 

43625 

18637 

37509 

14707 

93997 

66497 

68646 

78138 

66559 

64397 

11692 

05327 

82162 

83745 

22567 

48509 

23929 

27482 

45476 

04515 

25624 

95096 

67946 

16930 

33361 

15470 

48355 

88651 

22596 

83761 

60873 

43253 

84145 

20368 

07126 

20094 

98977 

74843 

93413 

14387 

06345 

80854 

09279 

41196 

37480 

73788 

06533 

28597 

20405 

51321 

92246 

80088 

77074 

66919 

31678 

60530 

45128 

74022 

84617 

72472 

00008 

80890 

18002 

35352 

54131 

44372 

15486 

65741 

14014 

05466 

55306 

93128 

18464 

79982 

68416 

18611 

19241 

66083 

24653 

84609 

58232 

41849 

84547 

46850 

52326 

58319 

15997 

08355 

60860 

29735 

47762 

46352 

33049 

69248 

93460 

61199 

67940 

55121 

29281 

59076 

07936 

11087 

96294 

14013 

31792 

18627 

90872 

00911 

98936 

76355 

93779 

52701 

08337 

56303 

87316 

00441 

58997 

14060 

40619 

29549 

69616 

57275 

36898 

81304 

48585 

32624 

68691 

14845 

46672 

61958 

77100 

20857 

73156 

70284 

24326 

35961 

73488 

41839 

55382 

17267 

70943 

15633 

84924 

90415 

93614 

20288 

34060 

39685 

23309 

10061 

68829 

92694 

48297 

39904 

02115 

59362 

95938 

74416 

53166 

35208 

33374 

77613 

19019 

88152 

00080 

9978*? 

Q3478 

53159. 

67^33 

35663 

5‘>Q7‘l! 

38688 






APPENDIX 2 


4oy 


Random Digits {Continued) 


27767 

43584 

85301 

88977 

29490 

69714 

94015 

64874 

32444 

48277 

13025 

14338 

54066 

15243 

47724 

66733 

74108 

88222 

88570 

74015 

80217 

36292 

98625 

24335 

24432 

24896 

62880 

87873 

95160 

59221 

10875 

62004 

90391 

61105 

57411 

06368 

11748 

12102 

80$80 

41867 

54127 

57326 

26629 

19087 

24472 

88779 

17944 

05600 

60|78 

03343 

60311 

42824 

37301 

42678 

45990 

43242 

66067 

42792 

95Ci43 

5268C 

49739 

71484 

92003 

98086 

76668 

73209 

54244 

91030 

45547 

70818 

78626 

51594 

16453 

94614 

39014 

97066 

30945 

57589 

31732 

5726C 

66692 

13986 

99837 

00582 

81232 

44987 

69170 

37403 

86995 

90307 

44071 

28091 

07362 

97703 

76447 

42537 

08345 

88975 

35841 

85771 

59820 

96163 

78851 

16499 

87064 

13075 

73035 

41207 

74699 

0931C 

25704 

91035 

26313 

77463 

55387 

72681 

47431 

43905 

31048 

5669C 

22304 

90314 

78438 

66276 

18396 

73538 

43277 

58874 

11466 

16082 

17710 

59621 

15292 

76139 

59526 

52113 

53856 

30743 

08670 

84741 

25852 

58906 

55018 

56374 

35824 

71708 

30540 

27886 

61732 

75454 

46780 

56487 

76211 

10271 

36633 

68424 

17374 

52003 

70707 

70214 

59849 

96166 

87195 

46692 

2678^ 

60939 

5®2 

11973 

02602 

3325C 

47670 

07654 

30342 

40277 

11049 

72049 

83012 

09832 

25571 

77628 

94304 

71803 

73465 

09819 

58869 

35220. 

09504 

96412 

90193 

79668 

08105 

59987 

21437 

36786 

49226 

77837 

98524 

97831 

65704 

09514 

64281 

61826 

18555 

64937 

64654 

25843 

41145 

42820 

14924 

3965C 

66847 

70495 

32360 

02985 

01755 

14750 

48968 

38603 

70812 

05682 

72461 

33230 

21529 

53424 

72877 

17334 

39283 

04149 

90860 

64618 

21032 

91050 

13058 

16218 

66554 

07850 

73950 

79552 

24781 

89683 

95362 

67011 

06651 

16136 

57216 

39618 

49856 

99326 

40902 

0506C 

49712 

97380 

10404 

55452 

09971 

59481 

37006 

22186 

72^85 

67385 

58275 

61764 

97586 

54716 

61459 

21647 

87417 

17198 

21443 

41808 

89514 

11788 

68224 

23417 

46376 

25366 

94746 

49580 

01176 

28838 

15472 

50669 

48139 

36732 

26825 

05511 

12459 

91314 

80582 

71944: 

12120 

86124 

51247 

44302 

87112 

21476 

14713 

71181 

13177 

55292 

95294 

00556 

70481 

06905 

21785 

41101 

49386 

54480 

23504 

23554 

66986 

34099 

74474 

20740 

47458 

64809 

06312 

88940 

15096 

69321 

80620 

51790 

11436 

38072 

40405 

68032 

60942 

00307 

11597 

92674 

55411 

85667 

77535 

99892 

71209 

92061 

92329 

98932 

78284 

46347 

95083 

06783 

28102 

57816 

85561 

29671 

77936 

63574 

31^84 

5192-! 

90726 

57166 

98884 

08583 

95889 

57067 

38101 

77756 

11667 

13897 

68984 

83620 

89747 

98882 

92613 

89719 

39641 

69457 

91339 

2250r 

36421 

16489 

18059 

51061 

67667 

60631 

84054 

40455 

99396 

6368C 

92638 

40333 

67054 

16067 

24700 

71594 

47468 

03577 

87649 

6326f 

21036 

82808 

77501 

97427 

76479 

68562 

43321 

31370 

28^77 

23896 

13173 

33365 

41468 

85149 

49554 

17994 

91178 

10174 

29120 

9043E 

86716 

38746 

94559 

37559 

49678 

53119 

98189 

81851 

29651 

84215 

92581 

02262 

41615 

70360 

64114 

58660 

96717 

54244 

10701 

4139C 

12470 

56500 

50273 

93113 

41794 

86861 

39448 

93136 

25722 

0856^ 

01016 

00857 

41396 

80504 

90670 

08289 

58137 

17820 

22751 

3651S 

34030 

60726 

25807 

24260 

71529 

78920 

47648 

13885 

70669 

9340C 

50259 

46345 

06170 

97965 

88302 

98041 

11947 

56203 

19324 

2050^ 

73959 

76145 

60808 

54444 

74412 

81105 

69IB1 

96845 

38625 

1160C 

46874 

37088 

80940 

44893 

10408 

36222 

14004 

23153 

69249 

05747 

60883 

5‘^109 

19516 

901 *^0 

46759 

716^3 


07589 

08809 

05085 



Table VII. Rank-Sum Critical Values* 

The sample sizes are shown in parentheses {ui, 712 ). The probability 
associated with a pair of critical values is the probability that T < smaller 
value, or equally, it is the probability that T > larger value. These prob- 
abilities are the closest ones to .025 and .05 that exist for integer values of T, 
The approximate .025 values should be used for a two-sided test with « = .05, 
and the approximate .05 values for a one-sided test. 



(2, 4) 



(4, 4) 



(6, 7) 


3 

11 

.067 

11 

25 

.029 

28 

56 

.026 


(2, 5) 


12 

24 

.057 

30 

54 

.051 

3 

13 

.047 


(4, 5) 



(6, 8) 



(2, 6) 


12 

28 

.032 

29 

61 

.021 

3 

15 

.036 

13 

27 

,056 

32 

58 

.054 

4 

14 

.071 


(4, 6) 



(6, 9) 



(2, 7) 


12 

32 

.019 

31 

65 

.025 

3 

17 

.028 

14 

30 

.057 

33 

63 

.044 

4 

16 

.056 


(4, 7) 



(6. 10) 


(2, 8) 


13 

35 

.021 

33 

69 

.028 

3 

19 

.022 

15 

33 

.055 

35 

67 

.047 

4 

18 

.044 


(4, 8) 



(7, 7) 



(2, 9) 


14 

38 

.024 

37 

68 

.027 

3 

21 

.018 

16 

36 

.055 

39 

66 

.049 

4 

20 

.036 


(4, 9) 



(7, 8) 



(2, 10) 

15 

41 

.025 

39 

73 

.027 

4 

22 

,030 

17 

39 

,053 

41 

71 

.047 

5 

21 

.061 


(4, 10) 


(7, 9) 



(3, 3) 


16 

44 

.026 

41 

78 

.027 

6 

15 

,050 

18 

42 

.053 

43 

76 

.045 


(3, 4) 



(5, 5) 



(7, 10) 

6 

18 

.028 

18 

37 

.028 

43 

83 

.028 

7 

17 

.057 

19 

36 

.048 

46 

80 

.054 


(3, 5) 



(5, 6) 



(8, 8) 


6 

21 

.018 

19 

41 

.026 

49 

87 

.025 

7 

20 

,036 

20 

40 

.041 

52 

84 

.052 


(3, 6) 



(5, 7) 



(8, 9) 


7 

23 

.024 

20 

45 

.024 

51 

93 

.023 

8 

22 

.048 

22 

43 

.053 

54 

90 

.046 


(3, 7) 



(5, 8) 



(8, 10) 

8 

25 

.033 

21 

49 

.023 

54 

98 

.027 

9 

24 

.058 

23 

47 

.047 

57 

95 

.051 


(3, 8) 



(5, 9) 



(9, 9) 


8 

28 

.024 

22 

53 

.021 

63 

108 

.025 

9 

27 

.042 

25 

50 

.056 

66 

105 

.047 


(3, 9) 



(5, 10) 


(9, 10) 

9 

30 

.032 

24 

56 

.028 

66 

114 

.027 

10 

29 

.050 

26 

54 

.050 

69 

111 

.047 


(3, 10) 


(6, 6) 



(10, 10) 

9 

33 

.024 

26 

52 

.021 

79 

131 

.026 

11 

31 

.056 

28 

50 

.047 

83 

127 

.053 


* This table was extracted from a more complete table (A-20) in Introduction to 
Statistical Analysis^ 2nd edition, by W. J. Dixon and F. J. Massey, with permission from 
the publishers, the McGraw-Hill Book Company. 

410 








Answers to Odd-N umber ed Ex ercise s 


Numerical answers often upon the order of operation$ and the 

extent of rounding off; hence a student’s answers may differ slightly 
from those given here. For many of the theoretical exercises, suggestions 
on how to proceed toward a solution have be^ but the^e should 

be used only as a last resort 


Chapter 2 


1. 


1 

9‘ 


5 . 


9. 


13. 


16 ■ 

i as compared with ^ 
12 ‘ 


17. 


21 . 


4 2 1 
7’7’7' 


- (*)/(:)• 

(:)(n)/(n)- 

1 8 

33. (a)-,(6)~. 


3. 


7 . 


11. 


1 

3 ’ 






it 

3 ’ 




£ 

16 



19. .43. 


23. 


27. 


2162 

54145* 

2 


31. 1 5(w^ - 7/2 + 14)/(/2» - 3/2^; + 2/2). 


35. 


39. 




iif 

843 * 


(t/) .184, (b) .736, (c) 


00 1 ^ 

y h 

x=o 


e. 


413 



414 


INTRODUCTION TO MATHEMATICAL STATISTICS 


41. (fl) -,(*)/(*) = 15/4a;! (5 


43. f(*) = 0,a!<0,f(0) = l,F(l) = l, 




i=’(a!) = l.a!^5. 


45. /(=.) = (2 ! ^) / (2) • = 0. * < 0. 

^•(!») = ^.0^a><l,F(a!)=l,l ^»<2, 

F(x) = I, :e S: 2. 

47. /(*) = 4 (^4] 49. f { x , y ) = -. 

51. /(a;, 3/) = 5120/813?! t/! (6 — a? — 2/)! 4®+^. 

53. (a)/(l) = \ , (6)^(0) = I . (c)/(0 1 1) = ./(I I 1) = . 

(4)^(0 l,0) = |,^(ll0)=|. 

55. /(®) = i./(y|®) = ^. 57. /(I) = .40, ^{3) = .054. 


59. («)/(0) = H ,/(i) = ^ ,/(2) = 1 , (b)f(3, 1 1) = 



61. The conditional distribution of y for x fixed will depend on x. Thus /(I, 0) = 0 
and /(1, 1) > 0. 

63. (a) c = 1, ib) .264, (c) .537. 

65. (fl) F(a;) = 0, a; ^ 0, F(a?) = ai, 0 < a; < 1, F(ai) = 1, a? > 1, 

(6) FCa;) = 0, ar < 0, F(x) = , 0 < a; < 1, F(a;) = - + 2a: - 1, 1 < a; ;< 2, 

F(a;) = 1, a? > 2, (c) F{x) = 1 tan"^ a? + ~ • 


67. (a) 500, (6) 1 - — , (c) i 

a? 2 

69. .25. 

7 1 31 

71. (fl) — , W 7- , (c) — , {d) if X and y were independent, 

256 16 256 

F{a; < .5, 2/ < .25} = P{a; < .5}F{2/ < .25}. But not true here. 

73. Find /(a:, t/), then sum with respect to x from 3/ to 00 by letting t = x — y and 
summing t from 0 to 00. 



f-1 <a: <0| 

|~1< 2 / < Oj 


and 


0 < a? < 1 

,0 < 2/ < 1 



ANSWERS TO ODD-NOTBERfe ' "415 

■ ■■ j|.- ■ • 


Chapter 3 


1 . 

3 , 

5. 

7. 


(a) a = .5, p = .25, (b) ol = 0, p ^ .75. 
Chooser; > 4.5. Then P = .25. 

(a) a = .25, (b) .125. 



IZ 

24* 



9, P(e) = 

11. P(p) = / 7 ® — 3 / 7 ® + 3 / 7 . Poor critical region if /7 is small, good if p is 'close to 1. 
13, § = x; hence 6 = 1 here. 

15. 6^ VHx.^ln, . V 


17. fix; p) = — 1^^) j ( 2 ^) * a; = 0,/7 = 0 maximizes; for £p = I, 


/? == - maximizes; for a; = 2,/7 = 1 maximizes. 


Chapter 4 


1. Boundaries 97.5 — 107.5 and 217.5 — 227.5. Class marks 102.5 and 2|2! 5 . 
3. Class marks are 156, 159, 162, •*• . 

5 , X = 4.43,.: . 



9 . Approximately 71 per cent and 96 per cent. The right tail contributes more and the 
left tail less than expected for a normal distribution, but the sum is fmrly close to 
expectation. 

11. Guess the niean to be 5'8^ and that 95 per cent have heights between and 6'1^. 
Then, under normality, u == 2.5"'. 

13. Nothing definite can be concluded ; however, in a common empirical dikribution, 
one would guess that the distribution has a long right tail. 


15. /Wft = mf — 




n\ 

17. Write («i + = J [(»( - «i) + (*i - ~ * 2 ) + (^2 ~ 

1 « 1+1 

then expand the binomials, sum term by term, and evaluate. 

19. The third moment about the mean is zero, yet the distributioh is not symmetrical. 
This shows that one cannot rely on the third moment about the mean as’a measure 
of symmetry. 



416 


INTRODUCTION TO MATHEMATICAL STATISTICS 


Chapter 5 


1 . 


3. 


5. 


7. 


9. 

11 . 


13. 



E(x) = 0, 



He averages the same, namely zero, but there are fewer 


extreme wins and losses. 

1 . 


(«) .27, ib) .74. 

(a) P{x < 2} = .411, (b) Successive days are not likely to behave like independent 
trials. Storms often last more than one day. 


9 27 

15 . 

17. ^ ^ - 36. 


21. .60. 

23. 124, using a normal approximation. 

25. .423. 

27. {a) In the expression for MJ,B)y write factor out and recognize 

the resulting series as the expansion of 




33. a = 

3 

35. {d) c = 2, {b) - ^ ■ , (c) 2{de^ — -h 1V0^ {d) expand and simplify. 

K ~Y 2. 

37. {a) c = 1/a!, {b) (a + A:)!/a!, (c) (1 - 
39. A4(I9) = (e® ~ 1)V<9^ 

41. {a) xo = 3.56, (/>) = -2.68. 

43. .43. 

45. iLi 2 k = (2k)! a2*/2*k! and jn^k+i = 0. 

47. (fl) .240, (6) .620. 

49. («) .112, (6) .112. 

51. Yes, since 180-220 includes 215. 

53. Not typical. 

55. .006. 

57. p = .0255; hence limits are .0255 ± .0150. Out of control on days numbered 18, 
22, and 38. 

59. {a) 1614, (b) 1681, since « is a maximum for p = .5. 



ANSWERS TO ODD-NUMBERED EXERCISEiS f 417 

61. .003. 

63. id) P{x ^ 20.5} = .00045, (b) .973. f 

65. («) .60, (6) .0104. 

67. (a) 2.48, (ft) 62. 

69. Using xjn = .2175 as the estimate for the binonliar frequehcies to 
unit are 7, 19, 24, 18, 9, 3, 1, 0, 0, 0, 0. 

71. .74, .70. 


73 . 


75 . 


m 

729' 

5 

To8 * 


77. .0034. 

— — 1 — — 2 j/— 

79. (a)(2/-l)e® ,(b)eV2Ac)e 

1 /y-i\ 

81. {a) e (ft) (y - 1) ^ e " jlVn. 


83 . 


85 . 


— _ 

Vz 

-ilog*® 


1 . 

jxV In. 


J »oo g 200j>(l-p) 

49 .. i. 

^ (iOtYlxl Calculate the probability that less than 50 ticltets will be 

. iCM). . .. 

sold in t minutes, assuming a Poisson distribution is valid for a time interval 
of t minutes. . 


Chapter 6 


7. If /?i and represent the number of trials for each binomial variable, n = rii + «2 
represents the number of trials for the combined experiment because/? Is the same 
for both. 

9. The moment generating function of canhb^^^^ expressed in tKe form 

11* E{z) = (n — l)/2, V(z) = (/^ + 1)/12. The latter follows from calcu- 


lating E 


^ ~ 5) ~ 5) ~ 

~ 0 ~ 0 ~ unless y = i + 1, in which 

ir\ 


case it has ^Hhe value 




«*+2 which occurs with probability i 
13, (fl) 400 ± 7.5, (ft) 225 : 

15 . 64 . 


~ because aJi = 1 anda?ivi = 1 implies ni < < 


^7. 9. 



418 INTRODUCTION TO MATHEMATICAL STATISTICS 

21. (a) ==1.3; hence reject the hypothesis, (b) x — y very likely differs from 

f^x — by less than 2.6 units, (c) n = 656. 

23. /? = 13.3; hence 14 will suffice. 25. 84. 

27. t = 1.66; hence accept pi ^ 

29. = .09. If testing against pi ^ p^, not significant. If testing against pi > p^^ 

significant. A significant result corresponds to a discriminating question. 

31. t = 1.68; hence reject pi — p^ = .10 as against pi — P2 > -10. 

33. (a) ^ = 71.6 , 5 = 31.1 ; hence limits are 30-113. (b) One would hardly be justified 
on the basis of this small and seemingly irregular sample to assume that production 
was under control. 

35. Mg.+y(6) = (e^ — Since this is not of the form (e®" — — a), which is 

the moment generating function for a horizontal variable in the interval (a, ^), the 
variable cannot have a horizontal distribution. 

37. Mn^{d) = (1 — 0)“”; hence, from problem 34, /(/tx) = e~^\nxY~^l{n — 1)!. 

41 . nf{z)F^-\z), 


Chapter 7 

1. .91. 

3. Relationship is not likely to be linear. For large amounts of fertilizer there may be 
a loss; hence the range of values here for the amount of fertilizer added is probably 
fairly large. 

5. Maximum traffic occurs around 8-9 a.m. and near 5 p.m. Maximum tides occur 
around 8 a.m. and 8 p.m. Thus maxima and minima occur fairly close together in 
time, yielding a scatter diagram that is strongly linear in character. 

7. ^ — 1.2; hence accept the hypothesis. 

9. y' = 6.29a; - 274. 

11. exp [-2b{Xi - cY] = J Vi^i exp [-b(Xi - cf] 

- c)^] = '^yi(Xi — c)xi exp [—b{Xi - cY] 

["“26(3:* — cY] =2 “ <^Y^i exp [—b{Xi — cY], 

15. y' = .39xi + .25:^2. 17. = (n^ - 1)/12. 

19. (< 2 ) r = .61 for a 6 X 6 classification, (b) y' = .0116a?i — \.\\y{c)y' = .0120a;i — 
.007a?2 .97, {d) additional variable in (c) gives very little improvement. 

21. y' = .857€*33iie. 

23. Direct least squares, if y increases with x. 

27. (fl) z = a?! — .753372. {b) Very little improvement over Xi alone. 

n 

29. a, 31. .11. 

ib = l 


Chapter 8 


1. (a) .15, (b) .5, (c) .5, (d) 0, (e) .5, (/) .75, (g) .5, {h) 7r/4. 

3. (a)f(x) = 2x,g(y) = 2(1 - y), {b)f{y | a;) = 1 jxjix | y) = 1 /(I - 2/), (c) = xjl, 

5. id) fix) = e~^,giy) = ( 2 / 4* l)■^ (b) fiy j x) = xe-^^,fix 1 1 /) = (^ + \Yxe-^^^+^\ 
(c) py^x = 1/a?. 



ANSWERS TO ODP-NUMBERED EXERCISES 419 

7. («)i«,,' = l/(p + l)(9 + l),(6)p = 0,(c)iU.„ = ^. 

9. (a) c = 1, (b) y = -xjl, -1 < a? < 0, y = xjl, 0 <x < 1, (c) 0. 

11, Any f(x) times any conditional distribution f(y \ x) which has the curve y == a?® 
value as its mean value. 

20 5 

13. (fl) = 1, fj>y = -^2, Gg^ = ^ , (7y® = ~ , p = .6, 


15. 

17. 

19. 


21 . 

23. 


25. 

31. 


(.b)f{x) = 4e ‘ /ViOtt, (c) = .3(x - 1) - 2. 

(ijf) multiply it by 2, (/>) divide it by the integral of /(a?) from 0 to cx?. 


f^y\x — M'V' 

E(x) = «/7, ^( 2 /) = «(« + 1);7/2, E(xy) = [1 + (« - \)p] pn(n + l)/2, 
(Mil = «(» + 1>(1 -/>)/2. 

as compared to 1 . ' 

16 




J“oo' = 1, iWio" — Moi' = jWtf, jMao = A^oa = o'/ are verified by employing 
result (18) in the text and symmetry, whereas p^u == pGggGy requires the evaluation 
of the integral defining A^ii. I 

For example, a uniform distribution in an ellipse. 


1/(1 + tc)\ 


Chapter 9 

1. Choose as critical region the sample points below the line aJi + x^ = where c is 
the proper constant. This is given by the Neyman-Pearson lemma. 

3. .69. 

5. Normal approximation yields x > 24.0 as critical region, • 

1 r°° —*2 

P(p) = at. P(.6) = .44. 

^ 27r j24.5-40y 

V 40m ; ^ 

7. « = 13.6; hence a sample of 14 is needed. 

9. Critical region given by S log ajt constant. 

11. S 37^ > constant. Yes. ^ 

13. A = (e/rt)^«[S(a?,. - b 

15. Critical region given by (p^lp)%q^lqY~^ < Aq, where p = xin and ^ = 1 ^ (a;/^i). No 

17. Critical region given by (7737^)^0- <5/(1 + §)« < Aq, where § = —(1 + «/S log3?i). 

19. = 5.1 with 1 degree of freedom; hence reject J^o- 

21. ^aibiGplV'Ha^GpZbpa^. f ' 

23. (a) PSi, (b) I, (c) equal weights variance is about 37 per cent larger than for 
minimizing weights. 

27. Calculate the probability of 3? — 1 successes in A' — I trials, followed by a success. 

29. yxilnN. 31. fi = x, 6= Vt(xi - £)>ln. 

1 

33. q = S37,/(2:3;, + n), 37, 10.1 - 1 1 .9. 

39. Treat as a binomial problem with /? = xjN and 2 successes in 2 / trials. 

41. 17.8 -■72.5."' ■ 



420 INTRODUCTION TO MATHEMATICAL STATISTICS 

Chapter 10 

3. = 1.3 with 1 degree of freedom; hence compatible. 

5, = 35 with 3 degrees of freedom; hence not compatible with theory. 

7, = 4.2 with 7 degrees of freedom (combining last 2 cells) ; hence fit is satisfactory. 

9. 2 :^ = 8 with 9 degrees of freedom if first 2 cells are combined and last 3 cells are 

combined ; hence fit is satisfactory. 

11 . y^ = 2.9 with 3 degrees of freedom if frequencies for ^ 4 are combined; hence 
fit is satisfactory. 

13. = 4.8 with 1 degree of freedom; hence reject the hypothesis. It appears that the 

serum had a harmful effect on the patients. 

15. Obviously not independent, since the contribution to from the first cell alone is 
larger than the critical value of for 12 degrees of freedom. 

17. y^ = 26.6 with 9 degrees of freedom; hence reject homogeneity. 

19. y^ = 20 with 14 degrees of freedom; hence justified. 

21, Take the logarithm of the multinomial frequency function for 3 cells and differ- 
entiate with respect to ^ = 1 — /?, 

23. Replace a + ^ -|- c dhy n, a + c by /ii, dhy n^, and let ^ = {a + b)ln serve 
.as the estimate of p. Then y will assume the form (/?/ — pzll^pqi^lf^i + l/^a)* 
For 1 degree of freedom the critical value of is the square of the corresponding 
normal curve critical value for this problem. 


Chapter 11 

1 . MziO) = (1 — 26)" ” ; hence z possesses a y^ distribution with 2n degrees of freedom. 

3. 16 < cr2 < 48. 5. \5<a^< 64. 

7, V and = 2v, 

9, E(ks^ — = (T ^{[(/2 4- 2)ln]k^ — 2k + 1}; hence choose k = nj{n + 2). This 

shows that + 2) is a better estimate than the unbiased estimate Tixy/n. 

11 , (a)t = —2.6 and /o = 2.09; hence reject JTo* (^) 38.7 < p < 45.3. 

13, 17.5 <p< 22.5. 

15. The integrand in the integral yielding E{t) is an odd function; hence the integral 
must vanish. 

17. (ist) / = — 1.6; hence accept Hq. {b) —9.2 < Pi — p^< 1.2. 

19. t — 2.23 with 38 degrees of freedom; hence reject the hypothesis. The x and y 

values are undoubtedly correlated here. 

21. -2.43 <Pi- P2< -.52. 

23. First show that A = — xyj'ZiXi — poY]^^ == [1 + — 1)]’^^; then show 

the relationship between the distributions of A and /. 

25. Critical region is [2(3?^ — xy + h(yi — yy]l[E,(Xi — fly + — fly] < Aq where 

fl = {x + y)l2. This can be shown to reduce to (x — yyi(nsx^ + nSy^) > Cq, which 
is equivalent to the square of a Student t variable test with 2{n — 1) degrees of 
freedom. 

27. F = 2.1 ; hence accept 

29. Choose Fq such that P{F > Fq} = .05 and P{F > Fo/2} = .95. Hence F© = 1.41. 
Tables yield « := 93. 

- 1 ) 1 ^ n^syjn^ ^ r ' 

n2S2\ni — 1 ) Fq ay' — 1 ) ® * 


31 . 



ANSWERS TO ODD-NUMBERED BMClSEr ^ " 421 

where Fq and Fq' are the right-tail critical values of F corresponding to Vi == — 1, 

^2 = «2 “ 1 and Vi s= «2 ^1, va = — 1, respectively. 

33. The variable 2aa; is a variable with 2 degrees of freedom; hence the ratio 

2aar/2a2/ = only possesses an F distribution with = 2 and Vg = 2. if^oW apply a 
two-sided, or one-sided, F test, depending on the alternative hypothesis. 

35. < Ao is the critical region. This is equivalent to 

-f I]2/i®/2a?i2)«i+«2 < Cq. But is an F variable; hence 

the test is equivalent to F^a/t/ii + < Cj. But this will be satped if, and 

only if, F < C2 or F> c^ ; hence the test is equivalent to a two-sided F test. 

37. W^e-^ (5) - loge^. ;; 

39. Use the formula for the frequency function of a quotient. : - . 

_*2 

41 . ze 43. (« - 

45. 46, which is the smallest integer exceeding the positive root of the equation (.9)”-^ = 
.5/(« -f 9). 

47. 40 ±.58i^. 

49. z = ^ where Fhas an F distribution with and ^ ni\ hence 

f(z) = q. «2ir)i(ni+n2-4) -- — 2;)^<«2--aiV. 

51. / = [F/ - ^ + 

degrees of freedom. ;; 


Chapter 12 

1. F = 2.5 and Fo = 3.06; hence accept iT©. 

3. = 16, 

5. Fr = 5.76 and Fo = 5.14; hence rations differ. Ft = 6.22 and F© = 4.76; hence 
types differ. 

7. « = 85.3; hence 86 necessary. 9. = 16, — 6. 

15. Study the ratio F{a? I p}/F{a; ] />'}. 

17. V(x) =: E(x^) ^ E\x) = S(l/^)F(a?,^) - [^(llk)E(x^r = (1/A:)i:[«;?,^, 4- - 

Writing out the sums defining - ^3,) + «(« - l)F(j9) will yield 
this same result. 

hence the variance is f as large. iP 

21. Because variances of means and proportions, for example, are proportional to 
sample sizes. Thus ratio of variances corresponds to ratio of sample ^izes needed 
for the same precision of estimate. 


Chapter 13 

■■■■ :'ir 


1 . 


3. 


5 . 




Eleven successes in 14 trials is not significant for a two-sided because r = 
(10.5 ~ 7)/V3.5 = 1.87. 

Alternating tied pairs, but beginning with the two-group for two-and-one ties, 
yields = .71 ; hence accept the hypothesis. i ^ 



422 INTRODUCTION TO MATHEMATICAL STATISTICS 

13. No, because the total number of runs is close to what would be expected under 
randomness. 

17. 2 or 4. 

19, By algebra, or by arguing that a test based on R cannot be affected because a test 
based on the complete serial correlation coefficient (circular) cannot be affected. 

21. D,o5 = 1.36/V500 = .06. The maximum difference between theoretical and 
sample distribution functions is .016; hence fit is satisfactory. 

23, Order 2/ values according to magnitude, form the sample distribution function, then 
add and subtract . 1 9 to it to obtain the desired band. 


Chapter 14 

1. Boundaries for dm are log 4/log | — w log |/log f and —log 4/iog f — 
m log f/log f . 

3. Boundaries for Sa;* are log # — w log f and log 8 — /w log f . 

9. £'i(/i) = 19. 11. /I = 34. 

13. Same solutions as for problem 12, namely {a) — i and J and (6) — J — log^ 2 and 
i — loge 2; hence y is of no help here. 

15. Boundary is the line y = 2x. 

17. (a) risk = 0/2, mean risk = J, (b) mean risk hence second estimate is better, 

19. 

n n 

21. E(ji I a?) = a; + a. 



Index 


Addition theorem, 9 
Analysis of variance, 299 

components of variance model, 307 
estimation in, 3 11 
generalizations in, 313 
linear hypothesis model, 300 
Array distribution, continuous variables, 
193 

discrete variables, 28 
normal variables, 200 
Asymptotic distribution, 234 
Average outgoing quality limit, 324 

Bayes’ formula, 16 
Bayes methods, 367, 369 
Bayes solutions, 369 
Bernoulli distribution, 86 
Best estimate, 232, 381 
Bias, 229 

Binomial distribution, 86 
moment generating function, 89 
moments, 87 

normal approximation, 104 
Poisson approximation, 91 
sketch, 91 

Binomial index of dispersion, 257 

Cauchy distribution, 79 
Central limit theorem, 145 
Change of variable, 119, 121 
Chi-square distribution, 122, 153 
additive property of, 268 
applied to variances, 268 
moment generating function, 154 
Chi-square test, 244 
for contingency tables, 253 
for curve fitting, 250 
for indices of dispersion, 255 
generality of, 249 
limitations on, 247 


Classification of data, 65 
Class mark, 66 

Coefficient, correlation, 163, i96 
regression, 171, 279 
Combinations, 18 
Combinatorial formulas, 17 ; 
Components of variance, 307 
Composite hypothesis, 213 ■ 

Conditional distribution, continuous, 
193 

discrete, 28 
normal, 200 

Conditional expected value, i 23 
Conditional probability, 1 1 = m = ^ . 
Confidence coefficient, 237 
Confidence intervals, 234 . 

Confidence limits, 236 

for means, 275, 277 ’’ 

for proportions, 239 
for regression coefficients, %19 
for variances, 270 
Consumer’s risk, 320 
Contingency tables, 252 
Continuous frequency function, 32, 35 
Continuous random variable, 33, 189 
Control chart, for means, 146^ 
for proportions, 114 
Correlation, linear, 160 
serial, 341 

Correlation coefficient, calculation of, 
165 

empirical, 163 
estimation of, 203 
interpretation of, 164 
properties, 375 
reliability, 166 
theoretical, 196 

"Covariance, 1 96 ’ ! ; "■ 

Cramer-Rao inequality, 379 ' 

473 



424 


INDEX 


Critical region, 48 
best, 214 
size of, 52 

Cumulative distribution function, 23 
Curve fitting, 169, 175, 177 
chi-square test for, 250 
Curve of regression, 194 
Curvilinear regression, 175 

Defective, fraction, 3 19 
Degrees of freedom, 153 
Design, statistical, 297 
Deviation, mean, 78 
standard, 73, 83 

Difference, of two means, confidence 
limits for, 277 
distribution of, 146, 277 
of two proportions, 148 
testing, 277, 279 
Discrete frequency function, 22 
Discrete random variable, 32 
Discriminant function, linear, 179 
Dispersion, indices of, 255 
Distribution, binomial, 86 
chi-square, 122, 153 
conditional, 28, 193 
hyper geometric, 116 
normal, 99 

of a function of a variable, 262 
of a correlation coefficient, 167 
of means, 139, 145 
of number of successes, 109 
of proportions, 110 
of runs, 339 
of sums of ranks, 333 
of sums of squares, 266 
of the difference of two means, 147 
of the difference of two proportions, 
149 

of the range, 29 1 
of the variance, 268 
Poisson, 90 
rectangular, 97 
274 

uniform, 97 

Distribution-free methods, 329 
Distribution function, 23 

for continuous variables, 37, 189 
for discrete variables, 23 


Error, probable, 141 
radial, 154 
size of, 49 
standard, 141 
two types of, 48 
Estimate, best, 232 
maximum likelihood, 58, 233 
minimax, 368 
unbiased, 228 
Estimation, 56, 228 

by confidence intervals, 234 
in analysis of variance, 3 1 1 
maximum likelihood, 58 
of regression parameters, 207 
of p, 203 
Estimator, 58 
Event space, 5 
Events, 6 

independent, 11 
mutually exclusive, 9, 10 
Expected value, 133, 135 
conditional, 323 
properties of, 135 

Extreme values, distribution of, 290 

Factorial function, 153 
F distribution, 285 

for analysis of variance, 304, 310 
for testing equality of two variances, 
285 

sketch of, 286 
use of tables for, 286 
Fraction defective, 114, 319 
lot tolerance, 319 
Frequency curve fitting, 250 
Frequency function, 22 
Bernoulli, 86 
binomial, 86 
Cauchy, 79 
chi-square, 153 
conditional, 28 
continuous, 32, 35 
discrete, 22 
F, 285 

hypergeometric, 116 
joint, 24 
marginal, 28 
multinomial, 118 
normal, 99, 198 
Poisson, 90 



INDEX 


425 


Frequency function, rectangular, 97 
Student’s r, 274 
Function, distribution, 23 
frequency, 22 
gamma, 153 
likelihood, 57 
moment generating, 84, 96 
power, 54 

Gamma function, 153 
Gaussian distribution, 99 
Geometric mean, 78 
Goodness of fit, 244 

degrees of freedom for, 250 
likelihood ratio test for, 376 
testing, 245, 347 

Histogram, 33 

Homogeneity, of means, 301 
of proportions, 257 
of variances, 225 
Hypergeometric distribution, 116 
Hypothesis, composite, 213 
simple, 213 
statistical, 46 
test of, 47, 212 

Independence of 3c and 383 
Independent events, 1 1 
Independent random variables, 26 
sum of, 138 

Indices of dispersion, 255 
Inductive inference, 45 
Inspection, minimum, 321 
sampling, 318 

Jacobian, 381 

Joint frequency function, 24 
for continuous variables, 38 
for discrete variables, 24 

Kolmogorov-Smlrnov statistic, 345 

Least squares, 169 
Likelihood function, 57 
Likelihood ratio tests, 220, 222, 376 
Limits, confidence, 236 
Linear discriminant function, 179 
Linear hypothesis, 300 


Linear regression, multiple, 1;72, 208 
simple, 168, 171 
Location, measures of, 70 
Loss function, 368 | 

Lot tolerance fraction defective, 319 

Marginal distribution, continuous, 191 
discrete, 28 
normal, 199 

Maximum liiteiihood estimate, 58 
properties of, 234 
Mean, best estimate of, 381 
computation of, 71 j' 

confidence limits for, 275 
control chart for, 146 ; 

distribution of, 139, 145 ; 

empirical, 70 
theoretical, 83 

Mean deviation, 78 ' 

Means, difference of two, 14"^, 276 
Median, 77 
test for, 333 
Minimax estimate, 368 
Minimum single sampling, 32 1 
Mode, 78 ’• 

Model, components of varlanpe, 307 
linear hypothesis, 300 
mathematical, 45 

Moment generating function, :84, 96 
of sum of independent varijables, 138 
properties of, 97 

relation to frequency function, 108 
Moments, empirical, 70, 73 ! 
of multivariate distributions, 135 
product, 196 
theoretical, 83, 95 
Multinomial distribution, 118 , 

Multiple classification, 363 
Multiple decision methods, 3^2 
Multiple linear regression, 172, 208 
Multiple regression coefficients, 172, 
20 ^ . . . , 
confidence limits for, 283 ; 
Multiplication theorem, 11 
Multivariate distributions, 133 

Neyman-Pearson lemma, 214' 
Nonparametric methods, 329 
Kolmogorov-Smirnpv statj|tic, 345 
rank sum test, 333 



426 


INDEX 


Nonparametric methods, runs, 335 
serial correlation, 341 
sign test, 330 
Normal correlation, 203 
Normal curve, standard, 103 
Normal distribution, of one variable, 99 
approximation to binomial, 104, 
109 

fitting to histograms, 102 
moment generating function, 101 
moments of, 99 
properties of, 101 
sketch of, 99 
standard, 103 
of two variables, 198 

conditional distribution for, 200 
geometry of, 202 
marginal distribution for, 199 
Normal equations of least squares, 174 
Normal regression, 205 
Normal surface, 201 

Orthogonal polynomials, 176 

Peakedness, measure of, 77 
Percentage defective, 319 
Percentages, difference of, 149 
distribution of, 110 
Permutations, 17 

for some elements alike, 19 
Plane of regression, 173 
Poisson distribution, 90 

approximation to binomial, 9 1 
sketch, 91 

Poisson index of dispersion, 257 
Polynomial regression, 175 
Polynomials, orthogonal, 176 
Population, 64 
Power curve, 54 
Power function, 54 
Probable error, 141 
Probability, 4 

addition theorem, 9 
conditional, 1 1 
definition of, 7 
density, 37 

multiplication theorem, 1 1 
Probability ratio test, 354 
Producer’s risk, 321 
Product moments, 196 


Proportions, difference of, 149 
distribution of, 110 

Quality control chart, for means, 146 
for percentages, 114 

Randomization, 220, 297 
principle of, 345 

Randomness of sequences, testing by 
runs, 335 

testing by serial correlation, 341 
Random sampling, 132 
Random variable, 22 
continuous, 33 
discrete, 22 

Random variables, independent, 26 
Range, 78 

distribution of, 288, 291 
relation to standard deviation, 292 
Rank sum test, 333 
Ratio tests, likelihood, 222, 376 
probability, 354 
Rectangular distribution, 97 
Regression, curve of, 194 
curvilinear, 175 
functions for, 177 
linear, 160, 168 
multiple, 172, 208 
normal, 205 
polynomial, 175 
Regression coefficient, 171 
confidence limits for, 279 
Regression line, 171 
Regression plane, 173 
Replication, 297 
Representative sampling, 3 17 
Risk, consumer’s, 320 
function, 368 
producer’s, 321 
Runs, 335 
distribution of, 339 
tables for, 340 

Sample, 64 
Sample space, 4 
probabilities, 5 
Sampling, random, 132 
representative, 317 
single, 319 
stratified, 316 



INDEX 


427 


Sampling inspection, 318 
Scatter diagram, 160 
Sensitivity, 297 
Sequential analysis, 352 
approximations in, 359 
expected sample size, 361 
for binomial distribute 356 
for normal distribution, 355 
probability ratio test, 354 
Serial correlation, 341 
Significance, 111 
Sign test, 330 
Simple hypothesis, 213 
Single sampling, 319 
minimum, 321 
Skewness, 69, 77 
Space, sample, 5 
Standard dcyiation, 73, 83 
computation of, 74 
interpretation of, 75 
relation to range, 292 
Standard error, 141 
Standard normal curve, 103 
Standard unit, 102, 162 
Statistic, 212 
Statistical hypothesis, 46 
test of, 47 

Statistical inference, 45 
Statistical methods, nature of, 45 
Stratified sampling, 315 
Student’s t distribution, 274 
applied to means, 275 
applied to regression, 279 


Student’s t distribution, sketch of , 275 
Sums of squares, distribution Of, 266 

t distribution, 274 

see also Student’s t distribution 
Tables, contingency, 252 
for other variables, 398-411 
for range, 292 
for runs, 340 

Test of a hypothesis, 41,212 
Tests, best, 214 
kinds of, 213 
likelihood ratio, 222, 376 
principle for selection of, 49 
sequential, 354 
Two types of error, 48 
size of, 49 

Transformations, 381 

Unbiased estirnates, 228 
Uniform distribution, 97 

Variable, change of, 1 19 
random, 22 - 

Variance, 73 

computation of, 74 ' 

confidence limits for, 270 . 
distribution of , 268 
unbiased estimate of, 229 ■ 

Variances, testing equality of!, 225 
Variation, measure of, 72 

see Chi-square distribution; Chi- 
square test 




Introduction to 
Mathematical Statistics 



